[comp.protocols.tcp-ip.domains] Experimental DNS RFC

rdhobby@UCDAVIS.EDU (Russ Hobby) (04/09/91)

The following document has been sent to the RFC Editor to be an
Experimental RFC (as opposed to being on the standards tract).  It is
along the lines of the MX Record discussion that has been going on. The
RFC Editor has given one week (until Apr 15) to reveiw the document and
to say if it is a "good thing".

As an experimental RFC the specs are there for people to try it and get
some experience with the "experiment". Since there is a Working Group
for DNS, the WG has the opportunity to review the document before
publication and say if it fits into the plans of the WG. If the WG
thinks that experiemental experience will be good, then fine. If the WG
has suggestions to the author before making it an experimental RFC,
that can be done as well.  If the WG thinks that this is something that
should be put on the standards tract now, the experimental RFC can be
redirected to the WG for review and on to becoming an Proposed Standard
RFC in a timely manner.  If it goes on to be an experimental RFC now,
it can be put into the standards tract by the WG at a later date.
(whew, made it though all that ;-)

Send your comments to me <rdhobby@ucdavis.edu> and Greg Vaudreuil
(gvaudre@nri.reston.va.us> (since I will be on vacation starting
Saturday) and, of course, the WG mail list.


Russ Hobby                              INTERNET: rdhobby@ucdavis.edu  
IETF Area Director - Applications       BITNET:   RDHOBBY@UCDAVIS  
                                        UUCP:  ...!ucbvax!ucdavis!rdhobby 
--------------------------------------------------------------------------

Network Working Group                                       T. P. Brisco
Request for Comments: 12XX                            Rutgers University
Updates: RFCs 1034, 1035                                      April 1991


                        LMX DNS Resource Record

Status of This Memo

   This memo defines an additional Domain Name Specification Resource
   Record.  This RFC specifies a Experimental Protocol and requests
   discussion and suggestions for improvements.  Please refer to the
   current edition of the "IAB Official Protocol Standards" for the
   standardization state and status of this protocol.  Distribution of
   this memo is unlimited.

1. Overview

   This memo is intended to standardize a method for the determination
   of local mail addresses for use within an organization only.  The
   Domain Name System Resource Record detailed herein is designed for
   use from a mail gateway to client machines only.

2. Introduction

   This memo proposes an extension of RFC1035 [Domain Names -
   Implementation and Specification].  The extension provides a Domain
   Name System ("DNS") Resource Record ("RR") for the addressing of
   local systems for mail redistribution.

   With increased levels of security for networks becoming commonplace,
   it is not unusual to find that mail destined for a particular domain
   (or set of domains) to be routed through a single addressable machine
   (sometimes known as "mail gateways").  With the increased level of
   security, it may be impossible for hosts on a subnet to communicate
   with the rest of the Internet Community at all.

   This DNS RR provides a fashion for these systems on restricted
   subnets to be able to exchange mail with hosts external to the
   addressable networks.

3. The LMX RR

   The LMX resource record became necessary in order to support the
   concept of "restricted networks".  This networks typically contain
   hosts that present minor security problems, usually because no user
   authentication is necessary or possible.  This may be public-access
   microcomputer laboratories in a typical computing center.  Hosts in



Brisco                                                          [Page 1]

RFC 12XX                LMX DNS Resource Record               April 1991


   these laboratories may not be able to send packets to networks
   outside of the autonomous system, effectively rendering these systems
   incapable of establishing connections to the "outside world".
   However, users may wish to originate or receive mail from hosts on
   this restricted network.

   Typically, an organization may have a designated "mail gateway"
   through which all mail, inbound and outbound, passes.  For mail
   passing from within the organizational network to external networks,
   there is typically no problem.  All hosts (except the gateway)
   forward mail to a particular machine.  The gateway, in turn, re-sends
   the mail to the indicated user on the specified host.  However, for
   inbound mail, the gateway will be unable to resolve any additional
   Mail Exchanger for the destined system.  For instance, assume that
   some host "public.rutgers.edu" exists on a publically accessible
   network, and may not establish connections to machines outside of the
   autonomous system.  To the external world, an MX record is announced
   for "public.rutgers.edu" as "gateway.rutgers.edu".  Inbound mail will
   arrive at "gateway.rutgers.edu" for redelivery to
   "public.rutgers.edu".  However, since the MX record is already in use
   to advertise the MX of "gateway", the host has no way of resolving an
   address for the local system.  In effect, a private, "local" MX is
   necessary in order to resolve an address.

   The LMX ("Local Mail eXchanger") record is for use within the
   organization's autonomous system (since the address specified by the
   LMX will probably not be addressable from external networks).  It is
   the mechanism by which the mail gateway may determine an address for
   a host on local network.  The mail gateway, which receives a message
   bound for a host for which it is the mail exchanger (i.e., the
   gateway's own host name is specified in the MX record) may attempt to
   retrieve an LMX record to determine the local address accepting mail
   for this host.

4. Format of the LMX RR

   The LMX is a DNS resource record, the data specified in it is case
   insensitive, it has type code XX (to be assigned by the IANA).  The
   LMX has the following format:

   <ehostname> <ttl> <class> LMX <weight> <lhostname>

   Both RDATA fields are required in all LMX RRs.  The <ehostname> is
   the domain name of the external name by which the host is known.  The
   <lhostname> is the domain name of the internal name by which the host
   is known.  LMX records cause type A additional section processing for
   <lhostname>.




Brisco                                                          [Page 2]

RFC 12XX                LMX DNS Resource Record               April 1991


   Note that the format and handling (by the DNS) of the LMX is exactly
   identical to that of the MX record.  LMX RRs should be exported by
   the DNS, in order for secondary nameservers to back up a site
   properly.

5. Security Considerations

   Security issues are not discussed in this memo.

6. Author's Address

   Thomas P. Brisco
   Rutgers University
   Computing Services
   Hill Center for the Mathmatical Sciences
   Busch Campus
   P.O. Box 879
   Piscataway, New Jersey 08855-0879

   Phone: 908-932-2351

   EMail: brisco@RUTGERS.EDU





























Brisco                                                          [Page 3]

rdhobby@UCDAVIS.EDU (Russ Hobby) (04/09/91)

The following document has been sent to the RFC Editor to be an
Experimental RFC (as opposed to being on the standards tract).  
The RFC Editor has given one week (until Apr 15) to reveiw the document 
and to say if it is a "good thing".

As an experimental RFC the specs are there for people to try it and get
some experience with the "experiment". Since there is a Working Group
for DNS, the WG has the opportunity to review the document before
publication and say if it fits into the plans of the WG. If the WG
thinks that experiemental experience will be good, then fine. If the WG
has suggestions to the author before making it an experimental RFC,
that can be done as well.  If the WG thinks that this is something that
should be put on the standards tract now, the experimental RFC can be
redirected to the WG for review and on to becoming an Proposed Standard
RFC in a timely manner.  If it goes on to be an experimental RFC now,
it can be put into the standards tract by the WG at a later date.

Send your comments to me <rdhobby@ucdavis.edu> and Greg Vaudreuil
(gvaudre@nri.reston.va.us> (since I will be on vacation starting
Saturday) and, of course, the WG mail list.


Russ Hobby                              INTERNET: rdhobby@ucdavis.edu  
IETF Area Director - Applications       BITNET:   RDHOBBY@UCDAVIS  
                                        UUCP:  ...!ucbvax!ucdavis!rdhobby 
-------------------------------------------------------------------------

Network Working Group                                          T. Brisco
Request for Comments: 12XX                            Rutgers University
Updates: RFCs 1034, 1035                                      April 1991


                        CIP DNS Resource Record

Status of This Memo

   This RFC defines an extension to the DNS system [RFC1035] by defining
   an additional Domain name Specification Resource Record.  This RFC
   specifies an Experimental Protocol and requests discussion and
   suggestions for improvements.  Please refer to the current edition of
   the "IAB Official Protocol Standards" for the standardization state
   and status of the protocol.  Distribution of this memo is unlimited.

1. Introduction

   This memo proposes an extension to RFC1035 [Domain Names -
   Implementation and Specification].  The extension is a generalized
   solution to the problem of adequately distributing usage of resources
   across a series of machines in a "cluster" configuration.  The
   extensions allow the binding of a single name to a series of
   machines.

2. Description of The Problem

   In current medium and large scale computer centers, frequently a
   series of mini- or micro-computers are configured so as to be
   identically functional to each other; that is, of a series of given
   workstations a user can log in and be unable to tell any functional
   differences between it and another member of a cluster.  These
   configurations are typically diskless workstations operating as
   "clients" from a server using NFS and other protocols to provide a
   consistent environment to the user across a series of actual
   machines, or they may be a series of larger time sharing machines
   clustered together in order to maximize utilization of resources
   (such as disk space).  In all cases, however, all members of the
   cluster provide the same resources that any other member of the
   cluster may.

   In situations where workstations are used, there is rarely a problem
   finding a machine to work on, users will find a workstation that
   no-one else is currently using.  However, when accessing workstations
   over the network, or when accessing time sharing machines clustered
   together, it has been observed that users tend to "bunch up" on a
   particular machine.  In one particular installation, it was noted
   that one machine typically had more users since it's host name was



Brisco                                                          [Page 1]

RFC 12XX                CIP DNS Resource Record               April 1991


   easier to spell.

   In order to adequately distribute users across a series of clustered
   machines, it becomes necessary to extend the concept of a single name
   bound to a single machine.  While there exists facilities to bind
   multiple names to single machines, there is no convenient way of
   binding a single name to multiple machines.

   However, merely binding a single name to multiple machines will not
   solve the problem at hand - distributing resource utilization with
   some respect to resource availability.  There are two ways to think
   about this distribution - the method can be either sentient, or
   non-sentient.  The method of distributing the utilization can either
   be aware of the current demands on the resources of a cluster member
   or not.  In the case of non-sentiency, a pseudo-random method of
   assigning a user (resource utilizer) to a machine (resource provider)
   can be used to achieve a somewhat (granular but) even distribution.
   In the case of sentiency, the entity making the binding between the
   utilizer and the provider must be aware of the current utilization of
   the resource provider.

3. The CIP DNS RR

   This memo provides an extension to RFC 1035 in order to provide a
   simple, non-sentient method of distributing utilizers amongst
   providers.  This method is not meant to be knowledgeable about the
   resource utilization on the hosts involved, rather it is meant to be
   a simple method of randomly distributing utilization across a series
   of providers.  The benefit of these records is that they can be
   implemented and utilized without any modifications to existing
   utilities, in addition to being easily implemented.  With a random
   distribution of utilizers across a series of resources, an
   approximation at utilization balancing can be achieved with a minimum
   amount of effort.

   The extension is implemented via a new RR type:

               TYPE    value   meaning
               ----    -----   -------------------------
               CIP      XX     Clustered Internet Pointer

   The CIP resource records (RR) define a series of names that define a
   cluster of resources.  When responding to requests, the response is a
   pseudo-random choice of any of the RRs.

   When a request for a RR comes into the DNS server, the server should
   first search for the named RR.  If the RR is not found, the server
   should then search for a CIP RR.  If a CIP RR is found, the CIP RR



Brisco                                                          [Page 2]

RFC 12XX                CIP DNS Resource Record               April 1991


   should be converted into a CNAME RR, and normal CNAME processing
   should then ensue.  In order to prevent caching by other nameservers,
   the CNAME RR should have a TTL of 1; however, the RR found in the
   additional information should have a TTL as defined in RFC1035.  If
   the particular RR is found to be associated with a cluster name, no
   CIP processing should occur, and the RR should be returned
   immediately (note that it does not make sense to associate an A
   record with a cluster).  For compatibility with software conforming
   to only RFC1035, when the DNS server responds after finding a CIP RR
   and the requested RR, the response should indicate the binding
   between the cluster name and the RR is that of a CNAME.

   The only time when a CIP RR should be returned is when the requested
   RR is of the types CIP, "any" or "*".  In this case ALL RRs (CIP or
   otherwise) should be returned.  This is to support future DNS
   implementations that may support a more "sentient" method of
   determining host selection.

   Each time that a cluster name with CIP RRs is processed, the CIP RRs
   should be reordered using some pseudo-random algorithm.  However,
   implementors are warned that the algorithm should be as fast as
   possible since the lookup of RRs is usually time critical.  In the
   initial implementation the author used a simple round-robin
   algorithm.

   Since CNAME RRs may point at a CNAME RR, CIP RRs may point at other
   clusters.  The ability to define clusters of clusters is inherent in
   the CIP RR processing, since at any given level the DNS is only
   resolving CNAME RRs.  However, this method of resolving clusters
   leads to some inherent ambiguity.  It is necessary to define to a
   certain extent how the CNAME processing should be handled.  For
   example, when attempting a MX RR lookup on a cluster; if the first
   CIP, at the next level of resolving, has no MX RR, should the DNS
   server check the next CIP in the cluster sequence or return the A RR
   associated with the resolved CIP?  The general rule of thumb should
   be that at any level of the resolving, the CIP RR processing should
   be treated as CNAME RR processing.  If the requested RR does not
   exist with the host information specified by the CNAME, the a failure
   should be returned for the lookup of the initial record.

                      |
                      |
                      v
                   does RR exist?  (yes) ----> return RR
                    (no)
                      |
                      |
                      v



Brisco                                                          [Page 3]

RFC 12XX                CIP DNS Resource Record               April 1991


                 does CIP RR exist? (no) ----> return FAIL
                    (yes)
                      |
                      |
                      |
                      v
                 randomize and retrieve CIP
                      |
                      |
                      v
                convert CIP to CNAME
                    set TTL to 1
                   resolve CNAME RR

   Since it is possible that administrators may cluster together
   machines of varying power, there is an optional parameter to the CIP
   RR indicating the respective "weight" of a host associated with a CIP
   RR.  This is to allow particular resource providers to be "found more
   frequently" than others.  This parameter defines, essentially, how
   many times the CIP record is found in the cluster.  It is not an
   error for the same CIP record to occur twice in a cluster.  If no
   weight is indicated for the CIP RR, then a weight of 1 is assumed.

   The full syntax of the CIP RR is as follows:

           IN      CIP     name [weight]

4. Examples of Configuration

   For clarity, examples utilizing the BIND implementation of DNS
   follow.

   ------

   An average entry might look like:

           cluster in      cip     rsrc1
                   in      cip     rsrc2
                   in      cip     rsrc3
                   in      cip     rsrc4
           rsrc1   in      a       128.6.7.38
           rsrc2   in      a       128.6.18.34
           rsrc3   in      a       128.6.4.4
           rsrc4   in      a       128.6.7.39

   This would cause the name "cluster" to be resolved to the addresses
   associated with rsrc1, rsrc2, rsrc3, and rsrc4 with fairly equal
   distribution.



Brisco                                                          [Page 4]

RFC 12XX                CIP DNS Resource Record               April 1991


   ------

   A cluster entry for machines of different power might look like:

           cluster in      cip     vax8650 7
                   in      cip     vax750 3
                   in      cip     vax730
           vax8650 in      a       128.6.61.3
           vax750  in      a       128.6.3.27
           vax730  in      a       128.6.1.10

   This would cause the "vax8650" to be found (on the average) 7 times
   as frequently as the "vax730", and nearly twice as frequently as the
   "vax750".

   ------

   An entry like:

           bunch   in      cip     vax1
                   in      cip     pyr1
                   in      cip     sun1
                   in      mx      mailmachine
                   in      hinfo   Admin. Center Time Sharing Computers
           vax1    in      a       128.6.69.10
                   in      mx      mailmachine1
           pyr1    in      a       128.6.18.22
                   in      mx      mailmachine2
           sun1    in      a       128.6.12.65
                   in      mx      mailmachine3

   would evenly distribute the usage across the machines "vax1", "pyr1",
   and "sun1".  Mail addressed to users at "bunch" would be delivered at
   "mailmachine" since the MX associated with the cluster would always
   be found first.  Mail addressed to users at "vax1", "pyr1", and
   "sun1" would all be delivered to the respective hosts indicated in
   the MX RRs.

5. Security Considerations

   Security issues are not discussed in this memo.










Brisco                                                          [Page 5]

RFC 12XX                CIP DNS Resource Record               April 1991


6. Author's Address

   Thomas P. Brisco
   Rutgers University
   Computing Services
   Hill Center for the Mathmatical Sciences
   Busch Campus
   P.O. Box 879
   Piscataway, New Jersey 08855-0879

   Phone: 908-932-2351

   EMail: brisco@RUTGERS.EDU






































Brisco                                                          [Page 6]

milton@en.ecn.purdue.edu (Milton D Miller) (04/10/91)

I sent this to the two people mentioned, but will put this out to the
group also.

In article <9104082253.AA26014@aggie.ucdavis.edu> Russ Hobby writes:
>The following document has been sent to the RFC Editor to be an
>Experimental RFC (as opposed to being on the standards tract).  It is
>along the lines of the MX Record discussion that has been going on. The
>RFC Editor has given one week (until Apr 15) to reveiw the document and
>to say if it is a "good thing".
>
>
>Send your comments to me <rdhobby@ucdavis.edu> and Greg Vaudreuil
>(gvaudre@nri.reston.va.us> (since I will be on vacation starting
>Saturday) and, of course, the WG mail list.
>
>
>Russ Hobby                              INTERNET: rdhobby@ucdavis.edu  
>IETF Area Director - Applications       BITNET:   RDHOBBY@UCDAVIS  
>                                        UUCP:  ...!ucbvax!ucdavis!rdhobby 

I don't know what WG mailing list is, so one of you can forward it if
you feel like it.  I am also posting this back to the newgroup.

The first thing I notice about the proposal is that it fails to address
the problem with all internal mail going through the gateway to reach
other local machines.  As written, only the gateway can use the LMX
record, and not other hosts in the AS.  The presently stated record
only defines one forwarding hop, which may as well be in a configuration
file on the gateway.

Since I am complaining, I will also suggest a modified record composed
on the spot (not discussed with anyone yet) to reduce this problem,
which appear following the excerpt.

>1. Overview
>
>   This memo is intended to standardize a method for the determination
>   of local mail addresses for use within an organization only.  The
>   Domain Name System Resource Record detailed herein is designed for
>   use from a mail gateway to client machines only.
>
>3. The LMX RR
>
>   The LMX ("Local Mail eXchanger") record is for use within the
>   organization's autonomous system (since the address specified by the
>   LMX will probably not be addressable from external networks).  It is
>   the mechanism by which the mail gateway may determine an address for
>   a host on local network.  The mail gateway, which receives a message
>   bound for a host for which it is the mail exchanger (i.e., the
>   gateway's own host name is specified in the MX record) may attempt to
>   retrieve an LMX record to determine the local address accepting mail
>   for this host.
>
>4. Format of the LMX RR
>
>   The LMX is a DNS resource record, the data specified in it is case
>   insensitive, it has type code XX (to be assigned by the IANA).  The
>   LMX has the following format:
>
>   <ehostname> <ttl> <class> LMX <weight> <lhostname>
>
>   Both RDATA fields are required in all LMX RRs.  The <ehostname> is
>   the domain name of the external name by which the host is known.  The
>   <lhostname> is the domain name of the internal name by which the host
>   is known.  LMX records cause type A additional section processing for
>   <lhostname>.
>

How about adding a field saying who can use this record?  For example:

 <hostname> <ttl> <class> LMX <weight> <server> <who>

where who is the optional domain name of who may use the record and
defaults to hosts listed in the MX records.  The who is not necessaryly
host name, for example who may be foo.com. means all hosts under the
domain foo.com.  The LMX records could then be used for the internal
domain, and MX records for the external domain.

All LMX weights exist in a single namespace; a host in an LMX can not
use a record of lower precedence regardless of the who field (to
eliminate loops).

Searching for and processing of LMX records is optional provided a host
is not pointed to by an LMX.  A host sorts records based on weight,
then starts making attempts to each server listed, skipping any records
whose domain does not match their own.  Processing stops if a host finds
its own name as the server.  If no applicable records were found, a host
would then proceed to use the existing MX records as is presently done.

This does not explicitly handle one of the cases that came up in the
newsgroup discussion -- the hypothetical Australian embassy host in
the AU domian connected to the USA internet.  However, if these
are all in one domain, they can have a LMX for them pointing to the
external MXs.  It also does not address sites that wish to hide their
internal host names from the outside world in the name of security
(I won't comment on that :-).

milton

stodola@FCCC.EDU (Bob Stodola) (04/10/91)

I read with great interest the proposal for LMX RR's.  While I see the
purpose (indeed, campus mail routing here goes through all sorts of
channels, and keeping it straight is a headache).  I have three reasons
why this proposal is not an ideal solution to this problem:

	1.  I am not sure that it is not redundant.  Higher preference MX's
	    (lower numbered) citing systems which are not accessible to the
	    outside world are one of the cited purposes of MX preference
	    codes -- outside mailers simply fail to find the internally
	    accessible systems, and proceed to find the gateway system.
	    In the context of the example in the XRFC, the DNS entries would
	    be:
		public.rutgers.edu  MX 0 public.rutgers.edu
		                    MX 10 gateway.rutgers.edu
	    Outsiders will fail on delivery to public and then try gateway.
	    Gateway is supposed to know that it cannot attempt delivery to
	    preference values  equal to or greater than itself, and should
	    attempt delivery to public.

	2.  It is unclear what I would put in the LMX RDATA field that would
	    eliminate the need for an external routing database.  For example,
	    our campus mailer delivers mail to systems via TCP/IP, DECNet,
	    appletalk shared disks, serial dial-up connections and other more
	    bizzare routing.  If you need the external routing database,
	    the LMX seems to be an unnecessary level of complication.

	3.  Given that the information has no value whatsoever to the outside
	    world, I'm a little uncomfortable including it in the IN class
	    database (as opposed to the HS or ?? class).


--------------------------------------------------------------------------
Robert K. Stodola                            Phone: (215) 728-3660
Manager, Research Computing Services         FAX: (215) 728-3574
The Fox Chase Cancer Center                  internet: stodola@fccc.edu
7701 Burholme Avenue              +---------------------------------------
Philadelphia, PA  19111           | "You are in a maze of twisty passages,
USA                               |  all alike.  There is a man page here.
----------------------------------+---------------------------------------

almquist@JESSICA.STANFORD.EDU ("Philip Almquist") (04/14/91)

Russ,
	Since nobody else has had much to say on this, and since I don't
know if you saw the message I sent to Dave Crocker, here are my comments
on the proposed CIP resource record (BTW, you might have gotten more of
a response from the Working Group if you'd sent the message to its
current mailing list, which I believe is dns-wg@nsl.dec.com).

	The purpose of the CIP record is to allow a generic name to
refer to multiple, functionally equivalent hosts.  When a DNS server
receives request for the address of such a generic name, it synthesizes
an A record for the generic name giving the address of one of the real
machines.  There are various ways that the DNS server could conceivably
choose which of the real machines to return information about.  The
proposal decrees that the server should use a particular mechanism, a
weighted round robin scheme.

	Something that makes the CIP record rather extraordinary is that
it is basically just a directive telling the server how to function.  A
server will not normally includes a CIP record in any response that it
sends.  This suggests to me that there might be alternative, non-
protocol mechanisms which accomplish the same purpose.

	Indeed, the problem of how to have a generic name for multiple
equivalent hosts was addressed by the DNS Working Group some time ago.
The group found that no new record types were needed or desirable.  CMU
(and probably other places) already do just what the author of the CIP
proposal wants to be able to do, without any extensions to the DNS
protocol.

	How do they do it?  They delegate authority for the generic name
to a special nameserver.  When that special nameserver gets a request
for the address associated with the generic name, it creates and returns
an A record claiming that the address associated with the generic name
is the address of one of the real hosts.  (Actually, there is no real
reason to have the special nameserver be separate from the regular one,
except that it simplified implementation).

	How does the special server decide which address to return?  It
could return the address of the machine with the lowest load average.
It could return the addresses using a weighted round robin scheme (in
which case, it's configuration file could even contain things that look
like CIP records).  Or it could do something else...  The point is that
the answer to the question in the first sentence is what the OSI people
call a "local matter".

	Does the CIP mechanism have any advantage?  Yes, there's a small
one.  Essentially, it standardizes the configuration information about
generic names sufficiently that the config files are portable to other
implementations of the mechanism.  Because the config files also happen
to be zone files, the config information can also be zone transferred to
other implementations.  However, I believe that the costs of the CIP
proposal outweigh its benefits.  The standardization of the config
information for generic names is achieved at the cost of requiring that
anyone using the mechanism has to use weighted round robin (or else
forget about CIP and use the current mechanism).  Additionally, the CIP
proposal would have to be implemented, whereas the current mechanism is
already implemented.

	I can't speak for the DNS Working Group, but my own opinion is
that the CIP record would bloat the standard for little if any real
gain.  If an RFC is to be published on the topic, it should instead
describe the currently used mechanism (and perhaps note where the
existing implementations may be obtained).
							Philip

brisco@pilot.njin.net (Thomas P. Brisco) (04/16/91)

In <9104140321.AA05594@jessica.stanford.edu>, almquist@JESSICA.STANFORD.EDU ("Philip Almquist") says
[...]
>	The purpose of the CIP record is to allow a generic name to
>refer to multiple, functionally equivalent hosts.  When a DNS server
>receives request for the address of such a generic name, it synthesizes
>an A record for the generic name giving the address of one of the real
>machines.  There are various ways that the DNS server could conceivably
>choose which of the real machines to return information about.  The
>proposal decrees that the server should use a particular mechanism, a
>weighted round robin scheme.
>
>	Something that makes the CIP record rather extraordinary is that
>it is basically just a directive telling the server how to function.  A
>server will not normally includes a CIP record in any response that it
>sends.  This suggests to me that there might be alternative, non-
>protocol mechanisms which accomplish the same purpose.

    The proposal doesn't explicitly name a round robin, only that
in my first implementations that it was used.  The weighting is
the more salient aspect of the RR.

    On the second paragraph, the key is that the servers _will_
pass the cluster information on, however only under the correct
conditions.  The information will be passed onto secondaries,
etc.  I prefer to think of it as a "polymorphic" record - it
seems to change slightly depending on the method of access. 
Sometimes it looks like a CNAME (in additional processing),
sometimes it looks like a MX (weighting) or sometimes a RR
unto itself.  The CIP doesn't impose a whole lot of reasoning
upon the RR - the records aren't returned based upon some dynamic
knowledge of the system (indeed, that belongs in a special purpose
nameserver).  It is a "lightweight" version of what you speak.
Sometimes the distribution of resources is sufficiently grey so
that only an approximation of the loading is necessary (is a 
machine loaded because a lot of processes are disk bound? cpu 
bound? number of logins are at a maximum?) or beneficial.

>	Indeed, the problem of how to have a generic name for multiple
>equivalent hosts was addressed by the DNS Working Group some time ago.
>The group found that no new record types were needed or desirable.  CMU
>(and probably other places) already do just what the author of the CIP
>proposal wants to be able to do, without any extensions to the DNS
>protocol.
>
>	How do they do it?  They delegate authority for the generic name
>to a special nameserver.  When that special nameserver gets a request
>for the address associated with the generic name, it creates and returns
>an A record claiming that the address associated with the generic name
>is the address of one of the real hosts.  (Actually, there is no real
>reason to have the special nameserver be separate from the regular one,
>except that it simplified implementation).

    However, there is no way of passing around the knowledge that
some name is actually a cluster, and not a single address.  Suppose
that, at your site, you run about 5 secondary nameservers (it 
seems you have geographically noncontiguous campuses) and need
serveral nameservers to act autonomously, but each still hand
out addresses in such a way that some level of "load sharing"
occur over a series of hosts.  There is no defined way of dispersing
the "clusterness" (gak) of a group of machines.

    In fact, you could replicate an entire second set of "load 
knowledgable nameservers" (sentient, in my terms), around your
campuses.  But then, there is a lot of files to update, a lot
of extra daemons, etc, etc.

    Note; I don't rule out the fact that a load knowledgeable
nameserver couldn't utilize the CIP records from a non-sentient
nameserver.  I would hope that people would use this aspect of
them (asking for a type "any").  Using the CIP records, a
sentient server could differentiate between series of clusters,
and single clusters.  Some additional language would be necessary
in order to tell a sentient nameserver where one cluster ends
and the next begins.  Assuming that a site has multiple clusters,
it would be nice to have only one nameserver handing out addresses
for a series of clusters (the nameserver could be sentient or
non-sentient).  Introducing some new method of indicating where 
clusters began or ended would be somewhat clumsy, and prone to 
errors. 

    Additional nameservers which cache initial replies are going
to defeat the distribution of tasks amongst the members of the
clusters.  Remote nameservers need to be told to act slightly
different with this address (hence the CNAME with a low TTL,
but an A record with a normal TTL).  As much information as
is possible is cached with the remote nameserver, however
some information will have to be retrieved every time.  Authoritative
secondaries (in this scheme) can have at least an approximation
at the load sharing that is going on, while handing out records
to local sites - further minimizing unnecessary traffic.

>	How does the special server decide which address to return?  It
>could return the address of the machine with the lowest load average.
>It could return the addresses using a weighted round robin scheme (in
>which case, it's configuration file could even contain things that look
>like CIP records).  Or it could do something else...  The point is that
>the answer to the question in the first sentence is what the OSI people
>call a "local matter".

    Again, how does the nameserver share the concept that the
hosts are "clustered" - logically one?  I'd like my cluster backed
up, far away, preferably by my secondary.  And when my secondaries
hand out information about my clusters, I'd like them to be
honored.

>	Does the CIP mechanism have any advantage?  Yes, there's a small
>one.  Essentially, it standardizes the configuration information about
>generic names sufficiently that the config files are portable to other
>implementations of the mechanism.  Because the config files also happen
>to be zone files, the config information can also be zone transferred to
>other implementations.  However, I believe that the costs of the CIP
>proposal outweigh its benefits.  The standardization of the config
>information for generic names is achieved at the cost of requiring that
>anyone using the mechanism has to use weighted round robin (or else
>forget about CIP and use the current mechanism).  Additionally, the CIP
>proposal would have to be implemented, whereas the current mechanism is
>already implemented.

    No, round robin, in fact, a later release implemented something
more efficient.  I don't believe consistent syntax to be a small
one - I'd like to have all of my nameservers handing out this 
information, and I'd prefer to not hack it in.  In most cases,
all is needed is a reasonable approximation at load sharing, but
many people need to do it for a lot of "clusters".  The ability
to cleanly indicate logical equivalence of a series of hosts
versus "magic domains" shouldn't be trivialized.

>	I can't speak for the DNS Working Group, but my own opinion is
>that the CIP record would bloat the standard for little if any real
>gain.  If an RFC is to be published on the topic, it should instead
>describe the currently used mechanism (and perhaps note where the
>existing implementations may be obtained).
>							Philip

    I have to admit that the CIP record is unusual, however
I do feel that it addresses a real need.  A zone can mean a lot
of things - a workgroup, an administrative unit, a delegation
of authority - but a cluster can only be one thing, and that
it should be treated slightly differently.


						    Tp.
-- 

...!rutgers!brisco (UUCP)               brisco@pilot.njin.net (ARPA)
    brisco@ZODIAC (BITNET)              908-932-2351          (VOICE)

Just say "Moo"