[mod.protocols.tcp-ip] Protcol Development on SUN 2 and 3 computers.

robert@SPAM.ISTC.SRI.COM (Robert Allen) (12/16/86)

Pardon me for posting with a non-TCP/IP related subject, I have
no good excuse...


	I'm wondering if anyone has attempted to develop other
protocols on Sun computers using the "open-architecture".  From
initial inspection of the Sun document "Network Implementation"
it appears that one can provide different protocol routines at
various layers, and make use of the kernel hooks built into the
system, thus provideing socket-type interfaces for protocols other
than the currently supported TCP/IP and UDP (I knew I could make
this letter pertinent).

	Specifically, I would like to know if; a) anyone has tried
this with other protocols, and if so then which protocols, b) which
layers are supported in this open architecture, and c) what problems
were encountered if any.

	Any comments, questions, pointers, etc. are appreciated.



						Robert Allen
						robert@spam.istc.sri.com
						OR
						robert@sri-spam.ARPA

steve@BRILLIG.UMD.EDU.UUCP (12/16/86)

   The Sun networking implementation is very close to being identical to
the standard 4.2BSD implementation.  Unfortunately, that makes development
of other protocols (unless they live on top of IP, in which case it's
not bad to do) more troublesome than you might expect as, if memory
serves, the networking implementation manual (also lifted from a 4.2BSD
manual) is incorrect (or, perhaps, misleading) in terms of talking
about its protocol independence.  There are all sorts of nasty AF_INET
dependencies lurking about in there, everywhere from the device drivers
to the network interface and routing code to NFS.  It is possible to
track all these dependencies down -- Chris Torek and James O'Toole
did it here when they did their Xerox NS implementation for 4.2BSD --
but it probably won't be a whole lot of fun to do.

   Back under Sun Unix 2.0 I hacked some XNS support into the kernel.
The way I did it was to remark, "gee, the interface between the network
code and the rest of the kernel isn't so bad" and stuff the whole of the
4.3BSD beta networking code into the kernel, throwing the Sun/4.2BSD code
out.  Depending on what you're doing, that may be a win.  I believe
that it was for me, as I didn't have to write a NS implementation if
I worked it that way.  Furthermore, changing the INET-dependent code
is probably not particulary hard, but you'll have to muck with the
innards of almost every kernel module in /sys/net*, and that could
be both tedious and frustrating.  Finally, the 4.3BSD networking
implementation is very much improved over the 4.2BSD one in the area
of TCP/IP, so you get a better TCP/IP in the bargain.

   Oh yes, and of course it looks easier to stuff an entirely new
(non-INET, non-NS) protocol into 4.3BSD than it does into 4.2BSD.
There can't be too many dependencies still lurking about, 'cause
the NS support works.

   Hope this is of use to you.

	-Steve

Spoken: Steve Miller 	ARPA:	steve@mimsy.umd.edu	Phone: +1-301-454-4251
CSNet:	steve@mimsy.umd.edu 	UUCP:	{seismo,allegra}!mimsy!steve
USPS: Computer Science Dept., University of Maryland, College Park, MD 20742

lantz@GREGORIO.STANFORD.EDU (Keith Lantz) (12/17/86)

Folks might also be interested to know that protocol development in
Berkeley UNIX has been rather easy for years at CMU and Stanford, who
jointly developed what is referred to as the "packet filter".  A paper
on the packet filter, by Jeff Mogul, Mike Accetta, and Rick Rashid was
just presented at the Conference on Practical Software Development
Environments.  Perhaps the first thing to know is that it provides for
application-level protocol development, rather than kernel hacking.
For example, that's how our ``UNIX server'' for the V-System is
implemented.

We have been beating on Berkeley for several years to include same with
the BSD distributions, with little success.  Rumor has it that it IS
included in the 4.3 distribution, but as unsupported software.  I am
not offering to support it myself, but if you're sufficiently
interested and vocal enough, who knows who might respond...

Keith

Following is the man page for the 4.3 version of the packet filter.
The 4.2 version differs somewhat.  




ENET(4)             UNIX Programmer's Manual              ENET(4)



NAME
     enet - ethernet packet filter

SYNOPSIS
     pseudo-device enetfilter 64

DESCRIPTION
     The packet filter provides a raw interface to Ethernets and
     similar network data link layers.  Packets received that are
     not used by the kernel (i.e., to support IP, ARP, and on
     some systems XNS, protocols) are available through this
     mechanism.  The packet filter appears as a set of character
     special files, one per hardware interface.  Each enet file
     may be opened multiple times, allowing each interface to be
     used by many processes.  The total number of open ethernet
     files is limited to the value given in the kernel configura-
     tion; the example given in the SYNOPSIS above sets the limit
     to 64.

     The minor device numbers are associated with interfaces when
     the system is booted.  Minor device 0 is associated with the
     first Ethernet interface ``attached'', minor device 1 with
     the second, and so forth.  (These character special files
     are, for historical reasons, given the names /dev/enet0,
     /dev/eneta0, /dev/enetb0, etc.)

     Associated with each open instance of an enet file is a
     user-settable packet filter which is used to deliver incom-
     ing ethernet packets to the appropriate process.  Whenever a
     packet is received from the net, successive packet filters
     from the list of filters for all open enet files are applied
     to the packet.  When a filter accepts the packet, it is
     placed on the packet input queue of the associated file.  If
     no filters accept the packet, it is discarded.  The format
     of a packet filter is described below.

     Reads from these files return the next packet from a queue
     of packets that have matched the filter.  If insufficient
     buffer space to store the entire packet is specified in the
     read, the packet will be truncated and the trailing contents
     lost.  Writes to these devices transmit packets on the net-
     work, with each write generating exactly one packet.

     The packet filter currently supports a variety of different
     ``Ethernet'' data-link levels:

     3mb Ethernet   packets consist of 4 or more bytes with the
                    first byte specifying the source ethernet
                    address, the second byte specifying the des-
                    tination ethernet  address, and the next two
                    bytes specifying the packet type.  (Actually,
                    on the network the source and destination



Printed 9/6/86           8 October 1985                         1






ENET(4)             UNIX Programmer's Manual              ENET(4)



                    addresses are in the opposite order.)

     byte-swapping 3mb Ethernet
                    packets consist of 4 or more bytes with the
                    first byte specifying the source ethernet
                    address, the second byte specifying the des-
                    tination ethernet address, and the next two
                    bytes specifying the packet type.  Each short
                    word (pair of bytes) is swapped from the net-
                    work byte order; this device type is only
                    provided as a concession to backwards-
                    compatibility.

     10mb Ethernet  packets consist of 14 or more bytes with the
                    first six bytes specifying the destination
                    ethernet address, the next six bytes the
                    source ethernet address, and the next two
                    bytes specifying the packet type.

     The remaining words are interpreted according to the packet
     type.  Note that 16-bit and 32-bit quantities may have to be
     byteswapped (and possible short-swapped) to be intelligible
     on a Vax.

     The packet filter mechanism does not know anything about the
     data portion of the packets it sends and receives.  The user
     must supply the headers for transmitted packets (although
     the system makes sure that the source address is correct)
     and the headers of received packets are delivered to the
     user.  The packet filters treat the entire packet, including
     headers, as uninterpreted data.

IOCTL CALLS
     In addition to FIONREAD, ten special ioctl calls may be
     applied to an open enet file.  The first two set and fetch
     parameters for the file and are of the form:

          #include <sys/types.h>
          #include <sys/enet.h>
          ioctl(fildes, code, param)
          struct eniocb *param;

     where param is defined in <sys/enet.h> as:

          struct eniocb
          {
                 u_char  en_addr;
                 u_char  en_maxfilters;
                 u_char  en_maxwaiting;
                 u_char  en_maxpriority;
                 long    en_rtout;
          };



Printed 9/6/86           8 October 1985                         2






ENET(4)             UNIX Programmer's Manual              ENET(4)



     with the applicable codes being:

     EIOCGETP
          Fetch the parameters for this file.

     EIOCSETP
          Set the parameters for this file.

     The maximum filter length parameter en_maxfilters indicates
     the maximum possible packet filter command list length (see
     EIOCSETF below).  The maximum input wait queue size parame-
     ter en_maxwaitingindicates the maximum number of packets
     which may be queued for an ethernet file at one time (see
     EIOCSETW below).  The maximum priority parameter
     en_maxpriority indicates the highest filter priority which
     may be set for the file (see EIOCSETF below).  The en_addr
     field is no longer maintained by the driver; see EIOCDEVP
     below.

     The read timeout parameter en_rtout specifies the number of
     clock ticks to wait before timing out on a read request and
     returning an EOF.  This parameter is initialized to zero by
     open(2), indicating no timeout. If it is negative, then read
     requests will return an EOF immediately if there are no
     packets in the input queue.  (Note that all parameters
     except for the read timeout are read-only and are ignored
     when changed.)

     A different ioctl is used to get device parameters of the
     ethernet underlying the minor device.  It is of the form:

          #include <sys/types.h>
          #include <sys/enet.h>
          ioctl(fildes, EIOCDEVP, param)

     where param is defined in <sys/enet.h> as:

          struct endevp {
                 u_char   end_dev_type;
                 u_char   end_addr_len;
                 u_short  end_hdr_len;
                 u_short  end_MTU;
                 u_char   end_addr[EN_MAX_ADDR_LEN];
                 u_char   end_broadaddr[EN_MAX_ADDR_LEN];
          };

     The fields are:

     end_dev_type   Specifies the device type; currently one of
                    ENDT_3MB, ENDT_BS3MB or ENDT_10MB.

     end_addr_len   Specifies the address length in bytes (e.g.,



Printed 9/6/86           8 October 1985                         3






ENET(4)             UNIX Programmer's Manual              ENET(4)



                    1 or 6).

     end_hdr_len    Specifies the total header length in bytes
                    (e.g., 4 or 14).

     end_MTU        Specifies the maximum packet size, including
                    header, in bytes.

     end_addr       The address of this interface; aligned so
                    that the low order byte of the address is the
                    first byte in the array.

     end_broadaddr  The hardware destination address for broad-
                    casts on this network.

     The next two calls enable and disable the input packet sig-
     nal mechanism for the file and are of the form:

          #include <sys/types.h>
          #include <sys/enet.h>
          ioctl(fildes, code, signp)
          u_int *signp;

     where signp is a pointer to a word containing the number of
     the signal to be sent when an input packet arrives and with
     the applicable codes being:

     EIOCENBS
          Enable the specified signal when an input packet is
          received for this file.  If the ENHOLDSIG flag (see
          EIOCMBIS below) is not set, further signals are
          automatically disabled whenever a signal is sent to
          prevent nesting and hence must be specifically re-
          enabled after processing.  When a signal number of 0 is
          supplied, this call is equivalent to EIOCINHS.

     EIOCINHS
          Disable any signal when an input packet is received for
          this file (the signp parameter is ignored).  This is
          the default when the file is first opened.

     The next two calls set and clear ``mode bits'' for the for
     the file and are of the form:

          #include <sys/types.h>
          #include <sys/enet.h>
          ioctl(fildes, code, bits)
          u_short *bits;

     where bits is a short work bit-mask specifying which bits to
     set or clear.  Currently, the only bit mask recognized is
     ENHOLDSIG, which (if clear) means that the driver should



Printed 9/6/86           8 October 1985                         4






ENET(4)             UNIX Programmer's Manual              ENET(4)



     disable the effect of EIOCENBS once it has delivered a sig-
     nal.  Setting this bit means that you need use EIOCENBS only
     once.  (For historical reasons, the default is that ENHOLD-
     SIG is set.) The applicable codes are:

     EIOCMBIS
          Sets the specified mode bits

     EIOCMBIC
          Clears the specified mode bits

     Another ioctl call is used to set the maximum size of the
     packet input queue for an open enet file.  It is of the
     form:

          #include <sys/types.h>
          #include <sys/enet.h>
          ioctl(fildes, EIOCSETW, maxwaitingp)
          u_int *maxwaitingp;

     where maxwaitingp is a pointer to a word containing the
     input queue size to be set.  If this is greater than maximum
     allowable size (see EIOCGETP above), it is set to the max-
     imum, and if it is zero, it is set to a default value.

     Another ioctl call flushes the queue of incoming packets.
     It is of the form:

          #include <sys/types.h>
          #include <sys/enet.h>
          ioctl(fildes, EIOCFLUSH, 0)

     The final ioctl call is used to set the packet filter for an
     open enet file.  It is of the form:

          #include <sys/types.h>
          #include <sys/enet.h>
          ioctl(fildes, EIOCSETF, filter)
          struct enfilter *filter

     where enfilter is defined in <sys/enet.h> as:

          struct enfilter
          {
                 u_char   enf_Priority;
                 u_char   enf_FilterLen;
                 u_short  enf_Filter[ENMAXFILTERS];
          };

     A packet filter consists of a priority, the filter command
     list length (in shortwords), and the filter command list
     itself.  Each filter command list specifies a sequence of



Printed 9/6/86           8 October 1985                         5






ENET(4)             UNIX Programmer's Manual              ENET(4)



     actions which operate on an internal stack.  Each shortword
     of the command list specifies an action from the set {
     ENF_PUSHLIT, ENF_PUSHZERO, ENF_PUSHWORD+N } which respec-
     tively push the next shortword of the command list, zero, or
     shortword N of the incoming packet on the stack, and a
     binary operator from the set { ENF_EQ, ENF_NEQ, ENF_LT,
     ENF_LE, ENF_GT, ENF_GE, ENF_AND, ENF_OR, ENF_XOR } which
     then operates on the top two elements of the stack and
     replaces them with its result.  When both an action and
     operator are specified in the same shortword, the action is
     performed followed by the operation.

     The binary operator can also be from the set { ENF_COR,
     ENF_CAND, ENF_CNOR, ENF_CNAND }.  These are ``short-
     circuit'' operators, in that they terminate the execution of
     the filter immediately if the condition they are checking
     for is found, and continue otherwise.  All pop two elements
     from the stack and compare them for equality; ENF_CAND
     returns false if the result is false; ENF_COR returns true
     if the result is true; ENF_CNAND returns true if the result
     is false; ENF_CNOR returns false if the result is true.
     Unlike the other binary operators, these four do not leave a
     result on the stack, even if they continue.

     The short-circuit operators should be used when possible, to
     reduce the amount of time spent evaluating filters.  When
     they are used, you should also arrange the order of the
     tests so that the filter will succeed or fail as soon as
     possible; for example, checking the Socket field of a Pup
     packet is more likely to indicate failure than the packet
     type field.

     The special action ENF_NOPUSH and the special operator
     ENF_NOP can be used to only perform the binary operation or
     to only push a value on the stack.  Since both are (con-
     veniently) defined to be zero, indicating only an action
     actually specifies the action followed by ENF_NOP, and indi-
     cating only an operation actually specifies ENF_NOPUSH fol-
     lowed by the operation.

     After executing the filter command list, a non-zero value
     (true) left on top of the stack (or an empty stack) causes
     the incoming packet to be accepted for the corresponding
     enet file and a zero value (false) causes the packet to be
     passed through the next packet filter.  (If the filter exits
     as the result of a short-circuit operator, the top-of-stack
     value is ignored.) Specifying an undefined operation or
     action in the command list or performing an illegal opera-
     tion or action (such as pushing a shortword offset past the
     end of the packet or executing a binary operator with fewer
     than two shortwords on the stack) causes a filter to reject
     the packet.



Printed 9/6/86           8 October 1985                         6






ENET(4)             UNIX Programmer's Manual              ENET(4)



     In an attempt to deal with the problem of overlapping and/or
     conflicting packet filters, the filters for each open enet
     file are ordered by the driver according to their priority
     (lowest priority is 0, highest is 255).  When processing
     incoming ethernet packets, filters are applied according to
     their priority (from highest to lowest) and for identical
     priority values according to their relative ``busyness''
     (the filter that has previously matched the most packets is
     checked first) until one or more filters accept the packet
     or all filters reject it and it is discarded.

     Filters at a priority of 2 or higher are called "high prior-
     ity" filters.  Once a packet is delivered to one of these
     "high priority" enet files, no further filters are examined,
     i.e. the packet is delivered only to the first enet file
     with a "high priority" filter which accepts the packet.  A
     packet may be delivered to more than one filter with a
     priority below 2; this might be useful, for example, in
     building replicated programs.  However, the use of low-
     priority filters imposes an additional cost on the system,
     as these filters each must be checked against all packets
     not accepted by a high-priority filter.

     The packet filter for an enet file is initialized with
     length 0 at priority 0 by open(2), and hence by default
     accepts all packets which no "high priority" filter is
     interested in.

     Priorities should be assigned so that, in general, the more
     packets a filter is expected to match, the higher its prior-
     ity.  This will prevent a lot of needless checking of pack-
     ets against filters that aren't likely to match them.

FILTER EXAMPLES
     The following filter would accept all incoming Pup packets
     on a 3mb ethernet with Pup types in the range 1-0100:

     struct enfilter f =
     {
         10, 19,                                 /* priority and length */
         ENF_PUSHWORD+1, ENF_PUSHLIT, 2,
                 ENF_EQ,                         /* packet type == PUP */
         ENF_PUSHWORD+3, ENF_PUSHLIT,
                 0xFF00, ENF_AND,                /* mask high byte */
         ENF_PUSHZERO, ENF_GT,                   /* PupType > 0 */
         ENF_PUSHWORD+3, ENF_PUSHLIT,
                 0xFF00, ENF_AND,                /* mask high byte */
         ENF_PUSHLIT, 0100, ENF_LE,              /* PupType <= 0100 */
         ENF_AND,                                /* 0 < PupType <= 0100 */
         ENF_AND                                 /* && packet type == PUP */
     };




Printed 9/6/86           8 October 1985                         7






ENET(4)             UNIX Programmer's Manual              ENET(4)



     Note that shortwords, such as the packet type field, are
     byte-swapped and so the literals you compare them to must be
     byte-swapped. Also, although for this example the word
     offsets are constants, code that must run with either 3mb or
     10mb ethernets must use offsets that depend on the device
     type.

     By taking advantage of the ability to specify both an action
     and operation in each word of the command list, the filter
     could be abbreviated to:

     struct enfilter f =
     {
         10, 14,                                     /* priority and length */
         ENF_PUSHWORD+1, ENF_PUSHLIT | ENF_EQ, 2,    /* packet type == PUP */
         ENF_PUSHWORD+3, ENF_PUSHLIT | ENF_AND,
                 0xFF00,                             /* mask high byte */
         ENF_PUSHZERO | ENF_GT,                      /* PupType > 0 */
         ENF_PUSHWORD+3, ENF_PUSHLIT | ENF_AND,
                 0xFF00,                             /* mask high byte */
         ENF_PUSHLIT | ENF_LE, 0100,                 /* PupType <= 0100 */
         ENF_AND,                                    /* 0 < PupType <= 0100 */
         ENF_AND                                     /* && packet type == PUP */
     };

     A different example shows the use of "short-circuit" opera-
     tors to create a more efficient filter.  This one accepts
     Pup packets (on a 3Mbit ethernet) with a Socket field of
     12345.  Note that we check the Socket field before the
     packet type field, since in most packets the Socket is not
     likely to match.

     struct enfilter f =
     {
         10, 9,                                      /* priority and length */
         ENF_PUSHWORD+7, ENF_PUSHLIT | ENF_CAND,
                 0,                                  /* High word of socket */
         ENF_PUSHWORD+8, ENF_PUSHLIT | ENF_CAND,
                 12345,                              /* Low word of socket */
         ENF_PUSHWORD+1, ENF_PUSHLIT | ENF_CAND,
                 2                                   /* packet type == Pup */
     };

SEE ALSO
     de(4), ec(4), en(4), il(4), enstat(8)

FILES
     /dev/enet{,a,b,c,...}0

BUGS
     The current implementation can only filter on words within
     the first "mbuf" of the packet; this is around 100 bytes (or



Printed 9/6/86           8 October 1985                         8






ENET(4)             UNIX Programmer's Manual              ENET(4)



     50 words).

     Because packets are streams of bytes, yet the filters
     operate on short words, and standard network byte order is
     usually opposite from Vax byte order, the relational opera-
     tors ENF_LT, ENF_LE, ENF_GT, and ENF_GE are not all that
     useful.  Fortunately, they were not often used when the
     packets were treated as streams of shorts, so this is prob-
     ably not a severe problem.  If this becomes a severe prob-
     lem, a byte-swapping operator could be added.

     Many of the "features" of this driver are there for histori-
     cal reasons; the manual page could be a lot cleaner if these
     were left out.

HISTORY
     8-Oct-85  Jeffrey Mogul at Stanford University
          Revised to describe 4.3BSD version of driver.

     18-Oct-84  Jeffrey Mogul at Stanford University
          Added short-circuit operators, changed discussion of
          priorities to reflect new arrangement.

     18-Jan-84  Jeffrey Mogul at Stanford University
          Updated for 4.2BSD (device-independent) version,
          including documentation of all non-kernel ioctls.

     17-Nov-81  Mike Accetta (mja) at Carnegie-Mellon University
          Added mention of <sys/types.h> to include examples.

     29-Sep-81  Mike Accetta (mja) at Carnegie-Mellon University
          Changed to describe new EIOCSETW and EIOCFLUSH ioctl
          calls and the new multiple packet queuing features.

     12-Nov-80  Mike Accetta (mja) at Carnegie-Mellon University
          Added description of signal mechanism for input pack-
          ets.

     07-Mar-80  Mike Accetta (mja) at Carnegie-Mellon University
          Created.















Printed 9/6/86           8 October 1985                         9

Rudy.Nedved@H.CS.CMU.EDU.UUCP (12/19/86)

The ENET filter mechanism is nice but CMU is using the 
BSD sockets mechanism. We still have a few applications
using the old mechanism but under a compatibility flag
and plan to flush the stuff.

>From the perspective of a network application hacker, the
only major thing that is missing from the BSD mechanism
is the ability to filter certain types of raw packets. It
should not be neccessary for one to modify the operating
system in order to see all packets of a certain type, or
length or from a certain host or going to a certain host.

The minor problems we have been ignoring. I don't like the
fact that the operating system code design seems to believe
it knows when data should be flushed for an SMTP connection
since it knows how to do it for a telnet and ftp data
connection. Heck, I like to get my performance anywhere I
can and dang...computers should be smart for the novice but
should not take control away from the expert...I hate waiting
for kernel bug fixes when a little more application level
control could create a work-around for the application.

-Rudy