[comp.doc] RFC1035 | part 2 of 3

brian@ucsd.EDU (Brian Kantor) (02/26/88)
---
classes, and which avoid duplication of information.  New classes are
appropriate when the DNS is to be used for a new protocol, etc which
requires new class-specific data formats, or when a copy of the existing
name space is desired, but a separate management domain is necessary.

New types and classes need mnemonics for master files; the format of the
master files requires that the mnemonics for type and class be disjoint.

TYPE and CLASS values must be a proper subset of QTYPEs and QCLASSes
respectively.

The present system uses multiple RRs to represent multiple values of a
type rather than storing multiple values in the RDATA section of a
single RR.  This is less efficient for most applications, but does keep
RRs shorter.  The multiple RRs assumption is incorporated in some
experimental work on dynamic update methods.

The present system attempts to minimize the duplication of data in the
database in order to insure consistency.  Thus, in order to find the
address of the host for a mail exchange, you map the mail domain name to
a host name, then the host name to addresses, rather than a direct
mapping to host address.  This approach is preferred because it avoids
the opportunity for inconsistency.

In defining a new type of data, multiple RR types should not be used to
create an ordering between entries or express different formats for
equivalent bindings, instead this information should be carried in the
body of the RR and a single type used.  This policy avoids problems with
caching multiple types and defining QTYPEs to match multiple types.

For example, the original form of mail exchange binding used two RR
types one to represent a "closer" exchange (MD) and one to represent a
"less close" exchange (MF).  The difficulty is that the presence of one
RR type in a cache doesn't convey any information about the other
because the query which acquired the cached information might have used
a QTYPE of MF, MD, or MAILA (which matched both).  The redesigned



Mockapetris                                                    [Page 24]

RFC 1035        Domain Implementation and Specification    November 1987


service used a single type (MX) with a "preference" value in the RDATA
section which can order different RRs.  However, if any MX RRs are found
in the cache, then all should be there.

4. MESSAGES

4.1. Format

All communications inside of the domain protocol are carried in a single
format called a message.  The top level format of message is divided
into 5 sections (some of which are empty in certain cases) shown below:

    +---------------------+
    |        Header       |
    +---------------------+
    |       Question      | the question for the name server
    +---------------------+
    |        Answer       | RRs answering the question
    +---------------------+
    |      Authority      | RRs pointing toward an authority
    +---------------------+
    |      Additional     | RRs holding additional information
    +---------------------+

The header section is always present.  The header includes fields that
specify which of the remaining sections are present, and also specify
whether the message is a query or a response, a standard query or some
other opcode, etc.

The names of the sections after the header are derived from their use in
standard queries.  The question section contains fields that describe a
question to a name server.  These fields are a query type (QTYPE), a
query class (QCLASS), and a query domain name (QNAME).  The last three
sections have the same format: a possibly empty list of concatenated
resource records (RRs).  The answer section contains RRs that answer the
question; the authority section contains RRs that point toward an
authoritative name server; the additional records section contains RRs
which relate to the query, but are not strictly answers for the
question.












Mockapetris                                                    [Page 25]

RFC 1035        Domain Implementation and Specification    November 1987


4.1.1. Header section format

The header contains the following fields:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      ID                       |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    QDCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    ANCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    NSCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    ARCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

where:

ID              A 16 bit identifier assigned by the program that
                generates any kind of query.  This identifier is copied
                the corresponding reply and can be used by the requester
                to match up replies to outstanding queries.

QR              A one bit field that specifies whether this message is a
                query (0), or a response (1).

OPCODE          A four bit field that specifies kind of query in this
                message.  This value is set by the originator of a query
                and copied into the response.  The values are:

                0               a standard query (QUERY)

                1               an inverse query (IQUERY)

                2               a server status request (STATUS)

                3-15            reserved for future use

AA              Authoritative Answer - this bit is valid in responses,
                and specifies that the responding name server is an
                authority for the domain name in question section.

                Note that the contents of the answer section may have
                multiple owner names because of aliases.  The AA bit



Mockapetris                                                    [Page 26]

RFC 1035        Domain Implementation and Specification    November 1987


                corresponds to the name which matches the query name, or
                the first owner name in the answer section.

TC              TrunCation - specifies that this message was truncated
                due to length greater than that permitted on the
                transmission channel.

RD              Recursion Desired - this bit may be set in a query and
                is copied into the response.  If RD is set, it directs
                the name server to pursue the query recursively.
                Recursive query support is optional.

RA              Recursion Available - this be is set or cleared in a
                response, and denotes whether recursive query support is
                available in the name server.

Z               Reserved for future use.  Must be zero in all queries
                and responses.

RCODE           Response code - this 4 bit field is set as part of
                responses.  The values have the following
                interpretation:

                0               No error condition

                1               Format error - The name server was
                                unable to interpret the query.

                2               Server failure - The name server was
                                unable to process this query due to a
                                problem with the name server.

                3               Name Error - Meaningful only for
                                responses from an authoritative name
                                server, this code signifies that the
                                domain name referenced in the query does
                                not exist.

                4               Not Implemented - The name server does
                                not support the requested kind of query.

                5               Refused - The name server refuses to
                                perform the specified operation for
                                policy reasons.  For example, a name
                                server may not wish to provide the
                                information to the particular requester,
                                or a name server may not wish to perform
                                a particular operation (e.g., zone



Mockapetris                                                    [Page 27]

RFC 1035        Domain Implementation and Specification    November 1987


                                transfer) for particular data.

                6-15            Reserved for future use.

QDCOUNT         an unsigned 16 bit integer specifying the number of
                entries in the question section.

ANCOUNT         an unsigned 16 bit integer specifying the number of
                resource records in the answer section.

NSCOUNT         an unsigned 16 bit integer specifying the number of name
                server resource records in the authority records
                section.

ARCOUNT         an unsigned 16 bit integer specifying the number of
                resource records in the additional records section.

4.1.2. Question section format

The question section is used to carry the "question" in most queries,
i.e., the parameters that define what is being asked.  The section
contains QDCOUNT (usually 1) entries, each of the following format:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                                               |
    /                     QNAME                     /
    /                                               /
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     QTYPE                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     QCLASS                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

where:

QNAME           a domain name represented as a sequence of labels, where
                each label consists of a length octet followed by that
                number of octets.  The domain name terminates with the
                zero length octet for the null label of the root.  Note
                that this field may be an odd number of octets; no
                padding is used.

QTYPE           a two octet code which specifies the type of the query.
                The values for this field include all codes valid for a
                TYPE field, together with some more general codes which
                can match more than one type of RR.



Mockapetris                                                    [Page 28]

RFC 1035        Domain Implementation and Specification    November 1987


QCLASS          a two octet code that specifies the class of the query.
                For example, the QCLASS field is IN for the Internet.

4.1.3. Resource record format

The answer, authority, and additional sections all share the same
format: a variable number of resource records, where the number of
records is specified in the corresponding count field in the header.
Each resource record has the following format:
                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                                               |
    /                                               /
    /                      NAME                     /
    |                                               |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      TYPE                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     CLASS                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      TTL                      |
    |                                               |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                   RDLENGTH                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
    /                     RDATA                     /
    /                                               /
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

where:

NAME            a domain name to which this resource record pertains.

TYPE            two octets containing one of the RR type codes.  This
                field specifies the meaning of the data in the RDATA
                field.

CLASS           two octets which specify the class of the data in the
                RDATA field.

TTL             a 32 bit unsigned integer that specifies the time
                interval (in seconds) that the resource record may be
                cached before it should be discarded.  Zero values are
                interpreted to mean that the RR can only be used for the
                transaction in progress, and should not be cached.





Mockapetris                                                    [Page 29]

RFC 1035        Domain Implementation and Specification    November 1987


RDLENGTH        an unsigned 16 bit integer that specifies the length in
                octets of the RDATA field.

RDATA           a variable length string of octets that describes the
                resource.  The format of this information varies
                according to the TYPE and CLASS of the resource record.
                For example, the if the TYPE is A and the CLASS is IN,
                the RDATA field is a 4 octet ARPA Internet address.

4.1.4. Message compression

In order to reduce the size of messages, the domain system utilizes a
compression scheme which eliminates the repetition of domain names in a
message.  In this scheme, an entire domain name or a list of labels at
the end of a domain name is replaced with a pointer to a prior occurance
of the same name.

The pointer takes the form of a two octet sequence:

    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    | 1  1|                OFFSET                   |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

The first two bits are ones.  This allows a pointer to be distinguished
from a label, since the label must begin with two zero bits because
labels are restricted to 63 octets or less.  (The 10 and 01 combinations
are reserved for future use.)  The OFFSET field specifies an offset from
the start of the message (i.e., the first octet of the ID field in the
domain header).  A zero offset specifies the first byte of the ID field,
etc.

The compression scheme allows a domain name in a message to be
represented as either:

   - a sequence of labels ending in a zero octet

   - a pointer

   - a sequence of labels ending with a pointer

Pointers can only be used for occurances of a domain name where the
format is not class specific.  If this were not the case, a name server
or resolver would be required to know the format of all RRs it handled.
As yet, there are no such cases, but they may occur in future RDATA
formats.

If a domain name is contained in a part of the message subject to a
length field (such as the RDATA section of an RR), and compression is



Mockapetris                                                    [Page 30]

RFC 1035        Domain Implementation and Specification    November 1987


used, the length of the compressed name is used in the length
calculation, rather than the length of the expanded name.

Programs are free to avoid using pointers in messages they generate,
although this will reduce datagram capacity, and may cause truncation.
However all programs are required to understand arriving messages that
contain pointers.

For example, a datagram might need to use the domain names F.ISI.ARPA,
FOO.F.ISI.ARPA, ARPA, and the root.  Ignoring the other fields of the
message, these domain names might be represented as:

       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    20 |           1           |           F           |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    22 |           3           |           I           |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    24 |           S           |           I           |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    26 |           4           |           A           |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    28 |           R           |           P           |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    30 |           A           |           0           |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    40 |           3           |           F           |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    42 |           O           |           O           |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    44 | 1  1|                20                       |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    64 | 1  1|                26                       |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    92 |           0           |                       |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

The domain name for F.ISI.ARPA is shown at offset 20.  The domain name
FOO.F.ISI.ARPA is shown at offset 40; this definition uses a pointer to
concatenate a label for FOO to the previously defined F.ISI.ARPA.  The
domain name ARPA is defined at offset 64 using a pointer to the ARPA
component of the name F.ISI.ARPA at 20; note that this pointer relies on
ARPA being the last label in the string at 20.  The root domain name is



Mockapetris                                                    [Page 31]

RFC 1035        Domain Implementation and Specification    November 1987


defined by a single octet of zeros at 92; the root domain name has no
labels.

4.2. Transport

The DNS assumes that messages will be transmitted as datagrams or in a
byte stream carried by a virtual circuit.  While virtual circuits can be
used for any DNS activity, datagrams are preferred for queries due to
their lower overhead and better performance.  Zone refresh activities
must use virtual circuits because of the need for reliable transfer.

The Internet supports name server access using TCP [RFC-793] on server
port 53 (decimal) as well as datagram access using UDP [RFC-768] on UDP
port 53 (decimal).

4.2.1. UDP usage

Messages sent using UDP user server port 53 (decimal).

Messages carried by UDP are restricted to 512 bytes (not counting the IP
or UDP headers).  Longer messages are truncated and the TC bit is set in
the header.

UDP is not acceptable for zone transfers, but is the recommended method
for standard queries in the Internet.  Queries sent using UDP may be
lost, and hence a retransmission strategy is required.  Queries or their
responses may be reordered by the network, or by processing in name
servers, so resolvers should not depend on them being returned in order.

The optimal UDP retransmission policy will vary with performance of the
Internet and the needs of the client, but the following are recommended:

   - The client should try other servers and server addresses
     before repeating a query to a specific address of a server.

   - The retransmission interval should be based on prior
     statistics if possible.  Too aggressive retransmission can
     easily slow responses for the community at large.  Depending
     on how well connected the client is to its expected servers,
     the minimum retransmission interval should be 2-5 seconds.

More suggestions on server selection and retransmission policy can be
found in the resolver section of this memo.

4.2.2. TCP usage

Messages sent over TCP connections use server port 53 (decimal).  The
message is prefixed with a two byte length field which gives the message



Mockapetris                                                    [Page 32]

RFC 1035        Domain Implementation and Specification    November 1987


length, excluding the two byte length field.  This length field allows
the low-level processing to assemble a complete message before beginning
to parse it.

Several connection management policies are recommended:

   - The server should not block other activities waiting for TCP
     data.

   - The server should support multiple connections.

   - The server should assume that the client will initiate
     connection closing, and should delay closing its end of the
     connection until all outstanding client requests have been
     satisfied.

   - If the server needs to close a dormant connection to reclaim
     resources, it should wait until the connection has been idle
     for a period on the order of two minutes.  In particular, the
     server should allow the SOA and AXFR request sequence (which
     begins a refresh operation) to be made on a single connection.
     Since the server would be unable to answer queries anyway, a
     unilateral close or reset may be used instead of a graceful
     close.

5. MASTER FILES

Master files are text files that contain RRs in text form.  Since the
contents of a zone can be expressed in the form of a list of RRs a
master file is most often used to define a zone, though it can be used
to list a cache's contents.  Hence, this section first discusses the
format of RRs in a master file, and then the special considerations when
a master file is used to create a zone in some name server.

5.1. Format

The format of these files is a sequence of entries.  Entries are
predominantly line-oriented, though parentheses can be used to continue
a list of items across a line boundary, and text literals can contain
CRLF within the text.  Any combination of tabs and spaces act as a
delimiter between the separate items that make up an entry.  The end of
any line in the master file can end with a comment.  The comment starts
with a ";" (semicolon).

The following entries are defined:

    <blank>[<comment>]




Mockapetris                                                    [Page 33]

RFC 1035        Domain Implementation and Specification    November 1987


    $ORIGIN <domain-name> [<comment>]

    $INCLUDE <file-name> [<domain-name>] [<comment>]

    <domain-name><rr> [<comment>]

    <blank><rr> [<comment>]

Blank lines, with or without comments, are allowed anywhere in the file.

Two control entries are defined: $ORIGIN and $INCLUDE.  $ORIGIN is
followed by a domain name, and resets the current origin for relative
domain names to the stated name.  $INCLUDE inserts the named file into
the current file, and may optionally specify a domain name that sets the
relative domain name origin for the included file.  $INCLUDE may also
have a comment.  Note that a $INCLUDE entry never changes the relative
origin of the parent file, regardless of changes to the relative origin
made within the included file.

The last two forms represent RRs.  If an entry for an RR begins with a
blank, then the RR is assumed to be owned by the last stated owner.  If
an RR entry begins with a <domain-name>, then the owner name is reset.

<rr> contents take one of the following forms:

    [<TTL>] [<class>] <type> <RDATA>

    [<class>] [<TTL>] <type> <RDATA>

The RR begins with optional TTL and class fields, followed by a type and
RDATA field appropriate to the type and class.  Class and type use the
standard mnemonics, TTL is a decimal integer.  Omitted class and TTL
values are default to the last explicitly stated values.  Since type and
class mnemonics are disjoint, the parse is unique.  (Note that this
order is different from the order used in examples and the order used in
the actual RRs; the given order allows easier parsing and defaulting.)

<domain-name>s make up a large share of the data in the master file.
The labels in the domain name are expressed as character strings and
separated by dots.  Quoting conventions allow arbitrary characters to be
stored in domain names.  Domain names that end in a dot are called
absolute, and are taken as complete.  Domain names which do not end in a
dot are called relative; the actual domain name is the concatenation of
the relative part with an origin specified in a $ORIGIN, $INCLUDE, or as
an argument to the master file loading routine.  A relative name is an
error when no origin is available.





Mockapetris                                                    [Page 34]

RFC 1035        Domain Implementation and Specification    November 1987


<character-string> is expressed in one or two ways: as a contiguous set
of characters without interior spaces, or as a string beginning with a "
and ending with a ".  Inside a " delimited string any character can
occur, except for a " itself, which must be quoted using \ (back slash).

Because these files are text files several special encodings are
necessary to allow arbitrary data to be loaded.  In particular:

                of the root.

@               A free standing @ is used to denote the current origin.

\X              where X is any character other than a digit (0-9), is
                used to quote that character so that its special meaning
                does not apply.  For example, "\." can be used to place
                a dot character in a label.

\DDD            where each D is a digit is the octet corresponding to
                the decimal number described by DDD.  The resulting
                octet is assumed to be text and is not checked for
                special meaning.

( )             Parentheses are used to group data that crosses a line
                boundary.  In effect, line terminations are not
                recognized within parentheses.

;               Semicolon is used to start a comment; the remainder of
                the line is ignored.

5.2. Use of master files to define zones

When a master file is used to load a zone, the operation should be
suppressed if any errors are encountered in the master file.  The
rationale for this is that a single error can have widespread
consequences.  For example, suppose that the RRs defining a delegation
have syntax errors; then the server will return authoritative name
errors for all names in the subzone (except in the case where the
subzone is also present on the server).

Several other validity checks that should be performed in addition to
insuring that the file is syntactically correct:

   1. All RRs in the file should have the same class.

   2. Exactly one SOA RR should be present at the top of the zone.

   3. If delegations are present and glue information is required,
      it should be present.



Mockapetris                                                    [Page 35]

RFC 1035        Domain Implementation and Specification    November 1987


   4. Information present outside of the authoritative nodes in the
      zone should be glue information, rather than the result of an
      origin or similar error.

5.3. Master file example

The following is an example file which might be used to define the
ISI.EDU zone.and is loaded with an origin of ISI.EDU:

@   IN  SOA     VENERA      Action\.domains (
                                 20     ; SERIAL
                                 7200   ; REFRESH
                                 600    ; RETRY
                                 3600000; EXPIRE
                                 60)    ; MINIMUM

        NS      A.ISI.EDU.
        NS      VENERA
        NS      VAXA
        MX      10      VENERA
        MX      20      VAXA

A       A       26.3.0.103

VENERA  A       10.1.0.52
        A       128.9.0.32

VAXA    A       10.2.0.27
        A       128.9.0.33


$INCLUDE <SUBSYS>ISI-MAILBOXES.TXT

Where the file <SUBSYS>ISI-MAILBOXES.TXT is:

    MOE     MB      A.ISI.EDU.
    LARRY   MB      A.ISI.EDU.
    CURLEY  MB      A.ISI.EDU.
    STOOGES MG      MOE
            MG      LARRY
            MG      CURLEY

Note the use of the \ character in the SOA RR to specify the responsible
person mailbox "Action.domains@E.ISI.EDU".







Mockapetris                                                    [Page 36]

RFC 1035        Domain Implementation and Specification    November 1987


6. NAME SERVER IMPLEMENTATION

6.1. Architecture

The optimal structure for the name server will depend on the host
operating system and whether the name server is integrated with resolver
operations, either by supporting recursive service, or by sharing its
database with a resolver.  This section discusses implementation
considerations for a name server which shares a database with a
resolver, but most of these concerns are present in any name server.

6.1.1. Control

A name server must employ multiple concurrent activities, whether they
are implemented as separate tasks in the host's OS or multiplexing
inside a single name server program.  It is simply not acceptable for a
name server to block the service of UDP requests while it waits for TCP
data for refreshing or query activities.  Similarly, a name server
should not attempt to provide recursive service without processing such
requests in parallel, though it may choose to serialize requests from a
single client, or to regard identical requests from the same client as
duplicates.  A name server should not substantially delay requests while
it reloads a zone from master files or while it incorporates a newly
refreshed zone into its database.

6.1.2. Database

While name server implementations are free to use any internal data
structures they choose, the suggested structure consists of three major
parts:

   - A "catalog" data structure which lists the zones available to
     this server, and a "pointer" to the zone data structure.  The
     main purpose of this structure is to find the nearest ancestor
     zone, if any, for arriving standard queries.

   - Separate data structures for each of the zones held by the
     name server.

   - A data structure for cached data. (or perhaps separate caches
     for different classes)

All of these data structures can be implemented an identical tree
structure format, with different data chained off the nodes in different
parts: in the catalog the data is pointers to zones, while in the zone
and cache data structures, the data will be RRs.  In designing the tree
framework the designer should recognize that query processing will need
to traverse the tree using case-insensitive label comparisons; and that



Mockapetris                                                    [Page 37]

RFC 1035        Domain Implementation and Specification    November 1987


in real data, a few nodes have a very high branching factor (100-1000 or
more), but the vast majority have a very low branching factor (0-1).

One way to solve the case problem is to store the labels for each node
in two pieces: a standardized-case representation of the label where all
ASCII characters are in a single case, together with a bit mask that
denotes which characters are actually of a different case.  The
branching factor diversity can be handled using a simple linked list for
a node until the branching factor exceeds some threshold, and
transitioning to a hash structure after the threshold is exceeded.  In
any case, hash structures used to store tree sections must insure that
hash functions and procedures preserve the casing conventions of the
DNS.

The use of separate structures for the different parts of the database
is motivated by several factors:

   - The catalog structure can be an almost static structure that
     need change only when the system administrator changes the
     zones supported by the server.  This structure can also be
     used to store parameters used to control refreshing
     activities.

   - The individual data structures for zones allow a zone to be
     replaced simply by changing a pointer in the catalog.  Zone
     refresh operations can build a new structure and, when
     complete, splice it into the database via a simple pointer
     replacement.  It is very important that when a zone is
     refreshed, queries should not use old and new data
     simultaneously.

   - With the proper search procedures, authoritative data in zones
     will always "hide", and hence take precedence over, cached
     data.

   - Errors in zone definitions that cause overlapping zones, etc.,
     may cause erroneous responses to queries, but problem
     determination is simplified, and the contents of one "bad"
     zone can't corrupt another.

   - Since the cache is most frequently updated, it is most
     vulnerable to corruption during system restarts.  It can also
     become full of expired RR data.  In either case, it can easily
     be discarded without disturbing zone data.

A major aspect of database design is selecting a structure which allows
the name server to deal with crashes of the name server's host.  State
information which a name server should save across system crashes



Mockapetris                                                    [Page 38]

RFC 1035        Domain Implementation and Specification    November 1987


includes the catalog structure (including the state of refreshing for
each zone) and the zone data itself.

6.1.3. Time

Both the TTL data for RRs and the timing data for refreshing activities
depends on 32 bit timers in units of seconds.  Inside the database,
refresh timers and TTLs for cached data conceptually "count down", while
data in the zone stays with constant TTLs.

A recommended implementation strategy is to store time in two ways:  as
a relative increment and as an absolute time.  One way to do this is to
use positive 32 bit numbers for one type and negative numbers for the
other.  The RRs in zones use relative times; the refresh timers and
cache data use absolute times.  Absolute numbers are taken with respect
to some known origin and converted to relative values when placed in the
response to a query.  When an absolute TTL is negative after conversion
to relative, then the data is expired and should be ignored.

6.2. Standard query processing

The major algorithm for standard query processing is presented in
[RFC-1034].

When processing queries with QCLASS=*, or some other QCLASS which
matches multiple classes, the response should never be authoritative
unless the server can guarantee that the response covers all classes.

When composing a response, RRs which are to be inserted in the
additional section, but duplicate RRs in the answer or authority
sections, may be omitted from the additional section.

When a response is so long that truncation is required, the truncation
should start at the end of the response and work forward in the
datagram.  Thus if there is any data for the authority section, the
answer section is guaranteed to be unique.

The MINIMUM value in the SOA should be used to set a floor on the TTL of
data distributed from a zone.  This floor function should be done when
the data is copied into a response.  This will allow future dynamic
update protocols to change the SOA MINIMUM field without ambiguous
semantics.

6.3. Zone refresh and reload processing

In spite of a server's best efforts, it may be unable to load zone data
from a master file due to syntax errors, etc., or be unable to refresh a
zone within the its expiration parameter.  In this case, the name server



Mockapetris                                                    [Page 39]

RFC 1035        Domain Implementation and Specification    November 1987


should answer queries as if it were not supposed to possess the zone.

If a master is sending a zone out via AXFR, and a new version is created
during the transfer, the master should continue to send the old version
if possible.  In any case, it should never send part of one version and
part of another.  If completion is not possible, the master should reset
the connection on which the zone transfer is taking place.

6.4. Inverse queries (Optional)

Inverse queries are an optional part of the DNS.  Name servers are not
required to support any form of inverse queries.  If a name server
receives an inverse query that it does not support, it returns an error
response with the "Not Implemented" error set in the header.  While
inverse query support is optional, all name servers must be at least
able to return the error response.

6.4.1. The contents of inverse queries and responses          Inverse
queries reverse the mappings performed by standard query operations;
while a standard query maps a domain name to a resource, an inverse
query maps a resource to a domain name.  For example, a standard query
might bind a domain name to a host address; the corresponding inverse
query binds the host address to a domain name.

Inverse queries take the form of a single RR in the answer section of
the message, with an empty question section.  The owner name of the
query RR and its TTL are not significant.  The response carries
questions in the question section which identify all names possessing
the query RR WHICH THE NAME SERVER KNOWS.  Since no name server knows
about all of the domain name space, the response can never be assumed to
be complete.  Thus inverse queries are primarily useful for database
management and debugging activities.  Inverse queries are NOT an
acceptable method of mapping host addresses to host names; use the IN-
ADDR.ARPA domain instead.

Where possible, name servers should provide case-insensitive comparisons
for inverse queries.  Thus an inverse query asking for an MX RR of
"Venera.isi.edu" should get the same response as a query for
"VENERA.ISI.EDU"; an inverse query for HINFO RR "IBM-PC UNIX" should
produce the same result as an inverse query for "IBM-pc unix".  However,
this cannot be guaranteed because name servers may possess RRs that
contain character strings but the name server does not know that the
data is character.

When a name server processes an inverse query, it either returns:

   1. zero, one, or multiple domain names for the specified
      resource as QNAMEs in the question section



Mockapetris                                                    [Page 40]

RFC 1035        Domain Implementation and Specification    November 1987


   2. an error code indicating that the name server doesn't support
      inverse mapping of the specified resource type.

When the response to an inverse query contains one or more QNAMEs, the
owner name and TTL of the RR in the answer section which defines the
inverse query is modified to exactly match an RR found at the first
QNAME.

RRs returned in the inverse queries cannot be cached using the same
mechanism as is used for the replies to standard queries.  One reason
for this is that a name might have multiple RRs of the same type, and
only one would appear.  For example, an inverse query for a single
address of a multiply homed host might create the impression that only
one address existed.

6.4.2. Inverse query and response example          The overall structure
of an inverse query for retrieving the domain name that corresponds to
Internet address 10.1.0.52 is shown below:

                         +-----------------------------------------+
           Header        |          OPCODE=IQUERY, ID=997          |
                         +-----------------------------------------+
          Question       |                 <empty>                 |
                         +-----------------------------------------+
           Answer        |        <anyname> A IN 10.1.0.52         |
                         +-----------------------------------------+
          Authority      |                 <empty>                 |
                         +-----------------------------------------+
         Additional      |                 <empty>                 |
                         +-----------------------------------------+

This query asks for a question whose answer is the Internet style
address 10.1.0.52.  Since the owner name is not known, any domain name
can be used as a placeholder (and is ignored).  A single octet of zero,
signifying the root, is usually used because it minimizes the length of
the message.  The TTL of the RR is not significant.  The response to
this query might be:














Mockapetris                                                    [Page 41]

RFC 1035        Domain Implementation and Specification    November 1987


                         +-----------------------------------------+
           Header        |         OPCODE=RESPONSE, ID=997         |
                         +-----------------------------------------+
          Question       |QTYPE=A, QCLASS=IN, QNAME=VENERA.ISI.EDU |
                         +-----------------------------------------+
           Answer        |  VENERA.ISI.EDU  A IN 10.1.0.52         |
                         +-----------------------------------------+
          Authority      |                 <empty>                 |
                         +-----------------------------------------+
         Additional      |                 <empty>                 |
                         +-----------------------------------------+

Note that the QTYPE in a response to an inverse query is the same as the
TYPE field in the answer section of the inverse query.  Responses to
inverse queries may contain multiple questions when the inverse is not
unique.  If the question section in the response is not empty, then the
RR in the answer section is modified to correspond to be an exact copy
of an RR at the first QNAME.

6.4.3. Inverse query processing

Name servers that support inverse queries can support these operations
through exhaustive searches of their databases, but this becomes
impractical as the size of the database increases.  An alternative
approach is to invert the database according to the search key.

For name servers that support multiple zones and a large amount of data,
the recommended approach is separate inversions for each zone.  When a
particular zone is changed during a refresh, only its inversions need to
be redone.

Support for transfer of this type of inversion may be included in future
versions of the domain system, but is not supported in this version.

6.5. Completion queries and responses

The optional completion services described in RFC-882 and RFC-883 have
been deleted.  Redesigned services may become available in the future.













Mockapetris                                                    [Page 42]

RFC 1035        Domain Implementation and Specification    November 1987


7. RESOLVER IMPLEMENTATION

The top levels of the recommended resolver algorithm are discussed in
[RFC-1034].  This section discusses implementation details assuming the
database structure suggested in the name server implementation section
of this memo.

7.1. Transforming a user request into a query

The first step a resolver takes is to transform the client's request,
stated in a format suitable to the local OS, into a search specification
for RRs at a specific name which match a specific QTYPE and QCLASS.
Where possible, the QTYPE and QCLASS should correspond to a single type
and a single class, because this makes the use of cached data much
simpler.  The reason for this is that the presence of data of one type
in a cache doesn't confirm the existence or non-existence of data of
other types, hence the only way to be sure is to consult an
authoritative source.  If QCLASS=* is used, then authoritative answers
won't be available.

Since a resolver must be able to multiplex multiple requests if it is to
perform its function efficiently, each pending request is usually
represented in some block of state information.  This state block will
typically contain:

   - A timestamp indicating the time the request began.
     The timestamp is used to decide whether RRs in the database
     can be used or are out of date.  This timestamp uses the
     absolute time format previously discussed for RR storage in
     zones and caches.  Note that when an RRs TTL indicates a
     relative time, the RR must be timely, since it is part of a
     zone.  When the RR has an absolute time, it is part of a
     cache, and the TTL of the RR is compared against the timestamp
     for the start of the request.

     Note that using the timestamp is superior to using a current
     time, since it allows RRs with TTLs of zero to be entered in
     the cache in the usual manner, but still used by the current
     request, even after intervals of many seconds due to system
     load, query retransmission timeouts, etc.

   - Some sort of parameters to limit the amount of work which will
     be performed for this request.

     The amount of work which a resolver will do in response to a
     client request must be limited to guard against errors in the
     database, such as circular CNAME references, and operational
     problems, such as network partition which prevents the



Mockapetris                                                    [Page 43]

RFC 1035        Domain Implementation and Specification    November 1987


     resolver from accessing the name servers it needs.  While
     local limits on the number of times a resolver will retransmit
     a particular query to a particular name server address are
     essential, the resolver should have a global per-request
     counter to limit work on a single request.  The counter should
     be set to some initial value and decremented whenever the
     resolver performs any action (retransmission timeout,
     retransmission, etc.)  If the counter passes zero, the request
     is terminated with a temporary error.

     Note that if the resolver structure allows one request to
     start others in parallel, such as when the need to access a
     name server for one request causes a parallel resolve for the
     name server's addresses, the spawned request should be started
     with a lower counter.  This prevents circular references in
     the database from starting a chain reaction of resolver
     activity.

   - The SLIST data structure discussed in [RFC-1034].

     This structure keeps track of the state of a request if it
     must wait for answers from foreign name servers.

7.2. Sending the queries

As described in [RFC-1034], the basic task of the resolver is to
formulate a query which will answer the client's request and direct that
query to name servers which can provide the information.  The resolver
will usually only have very strong hints about which servers to ask, in
the form of NS RRs, and may have to revise the query, in response to
CNAMEs, or revise the set of name servers the resolver is asking, in
response to delegation responses which point the resolver to name
servers closer to the desired information.  In addition to the
information requested by the client, the resolver may have to call upon
its own services to determine the address of name servers it wishes to
contact.

In any case, the model used in this memo assumes that the resolver is
multiplexing attention between multiple requests, some from the client,
and some internally generated.  Each request is represented by some
state information, and the desired behavior is that the resolver
transmit queries to name servers in a way that maximizes the probability
that the request is answered, minimizes the time that the request takes,
and avoids excessive transmissions.  The key algorithm uses the state
information of the request to select the next name server address to
query, and also computes a timeout which will cause the next action
should a response not arrive.  The next action will usually be a
transmission to some other server, but may be a temporary error to the



Mockapetris                                                    [Page 44]

RFC 1035        Domain Implementation and Specification    November 1987


client.

The resolver always starts with a list of server names to query (SLIST).
This list will be all NS RRs which correspond to the nearest ancestor
zone that the resolver knows about.  To avoid startup problems, the
resolver should have a set of default servers which it will ask should
it have no current NS RRs which are appropriate.  The resolver then adds
to SLIST all of the known addresses for the name servers, and may start
parallel requests to acquire the addresses of the servers when the
resolver has the name, but no addresses, for the name servers.

To complete initialization of SLIST, the resolver attaches whatever
history information it has to the each address in SLIST.  This will
usually consist of some sort of weighted averages for the response time
of the address, and the batting average of the address (i.e., how often
the address responded at all to the request).  Note that this
information should be kept on a per address basis, rather than on a per
name server basis, because the response time and batting average of a
particular server may vary considerably from address to address.  Note
also that this information is actually specific to a resolver address /
server address pair, so a resolver with multiple addresses may wish to
keep separate histories for each of its addresses.  Part of this step
must deal with addresses which have no such history; in this case an
expected round trip time of 5-10 seconds should be the worst case, with
lower estimates for the same local network, etc.

Note that whenever a delegation is followed, the resolver algorithm
reinitializes SLIST.

The information establishes a partial ranking of the available name
server addresses.  Each time an address is chosen and the state should
be altered to prevent its selection again until all other addresses have
been tried.  The timeout for each transmission should be 50-100% greater
than the average predicted value to allow for variance in response.

Some fine points:

   - The resolver may encounter a situation where no addresses are
     available for any of the name servers named in SLIST, and
     where the servers in the list are precisely those which would
     normally be used to look up their own addresses.  This
     situation typically occurs when the glue address RRs have a
     smaller TTL than the NS RRs marking delegation, or when the
     resolver caches the result of a NS search.  The resolver
     should detect this condition and restart the search at the
     next ancestor zone, or alternatively at the root.





Mockapetris                                                    [Page 45]

RFC 1035        Domain Implementation and Specification    November 1987


   - If a resolver gets a server error or other bizarre response
     from a name server, it should remove it from SLIST, and may
     wish to schedule an immediate transmission to the next
     candidate server address.

7.3. Processing responses

The first step in processing arriving response datagrams is to parse the
response.  This procedure should include:

   - Check the header for reasonableness.  Discard datagrams which
     are queries when responses are expected.

   - Parse the sections of the message, and insure that all RRs are
     correctly formatted.

   - As an optional step, check the TTLs of arriving data looking
     for RRs with excessively long TTLs.  If a RR has an
     excessively long TTL, say greater than 1 week, either discard
     the whole response, or limit all TTLs in the response to 1
     week.

The next step is to match the response to a current resolver request.
The recommended strategy is to do a preliminary matching using the ID
field in the domain header, and then to verify that the question section
corresponds to the information currently desired.  This requires that
the transmission algorithm devote several bits of the domain ID field to
a request identifier of some sort.  This step has several fine points:

   - Some name servers send their responses from different
     addresses than the one used to receive the query.  That is, a
     resolver cannot rely that a response will come from the same
     address which it sent the corresponding query to.  This name
     server bug is typically encountered in UNIX systems.

   - If the resolver retransmits a particular request to a name
     server it should be able to use a response from any of the
     transmissions.  However, if it is using the response to sample
     the round trip time to access the name server, it must be able
     to determine which transmission matches the response (and keep
     transmission times for each outgoing message), or only
     calculate round trip times based on initial transmissions.

   - A name server will occasionally not have a current copy of a
     zone which it should have according to some NS RRs.  The
     resolver should simply remove the name server from the current
     SLIST, and continue.




Mockapetris                                                    [Page 46]

RFC 1035        Domain Implementation and Specification    November 1987


7.4. Using the cache

In general, we expect a resolver to cache all data which it receives in
responses since it may be useful in answering future client requests.
However, there are several types of data which should not be cached:

   - When several RRs of the same type are available for a
     particular owner name, the resolver should either cache them
     all or none at all.  When a response is truncated, and a
     resolver doesn't know whether it has a complete set, it should
     not cache a possibly partial set of RRs.

   - Cached data should never be used in preference to
     authoritative data, so if caching would cause this to happen
     the data should not be cached.

   - The results of an inverse query should not be cached.

   - The results of standard queries where the QNAME contains "*"
     labels if the data might be used to construct wildcards.  The
     reason is that the cache does not necessarily contain existing
     RRs or zone boundary information which is necessary to
     restrict the application of the wildcard RRs.

---