robert@SPAM.ISTC.SRI.COM (Robert Allen) (12/16/86)
Pardon me for posting with a non-TCP/IP related subject, I have no good excuse... I'm wondering if anyone has attempted to develop other protocols on Sun computers using the "open-architecture". From initial inspection of the Sun document "Network Implementation" it appears that one can provide different protocol routines at various layers, and make use of the kernel hooks built into the system, thus provideing socket-type interfaces for protocols other than the currently supported TCP/IP and UDP (I knew I could make this letter pertinent). Specifically, I would like to know if; a) anyone has tried this with other protocols, and if so then which protocols, b) which layers are supported in this open architecture, and c) what problems were encountered if any. Any comments, questions, pointers, etc. are appreciated. Robert Allen robert@spam.istc.sri.com OR robert@sri-spam.ARPA
steve@BRILLIG.UMD.EDU.UUCP (12/16/86)
The Sun networking implementation is very close to being identical to the standard 4.2BSD implementation. Unfortunately, that makes development of other protocols (unless they live on top of IP, in which case it's not bad to do) more troublesome than you might expect as, if memory serves, the networking implementation manual (also lifted from a 4.2BSD manual) is incorrect (or, perhaps, misleading) in terms of talking about its protocol independence. There are all sorts of nasty AF_INET dependencies lurking about in there, everywhere from the device drivers to the network interface and routing code to NFS. It is possible to track all these dependencies down -- Chris Torek and James O'Toole did it here when they did their Xerox NS implementation for 4.2BSD -- but it probably won't be a whole lot of fun to do. Back under Sun Unix 2.0 I hacked some XNS support into the kernel. The way I did it was to remark, "gee, the interface between the network code and the rest of the kernel isn't so bad" and stuff the whole of the 4.3BSD beta networking code into the kernel, throwing the Sun/4.2BSD code out. Depending on what you're doing, that may be a win. I believe that it was for me, as I didn't have to write a NS implementation if I worked it that way. Furthermore, changing the INET-dependent code is probably not particulary hard, but you'll have to muck with the innards of almost every kernel module in /sys/net*, and that could be both tedious and frustrating. Finally, the 4.3BSD networking implementation is very much improved over the 4.2BSD one in the area of TCP/IP, so you get a better TCP/IP in the bargain. Oh yes, and of course it looks easier to stuff an entirely new (non-INET, non-NS) protocol into 4.3BSD than it does into 4.2BSD. There can't be too many dependencies still lurking about, 'cause the NS support works. Hope this is of use to you. -Steve Spoken: Steve Miller ARPA: steve@mimsy.umd.edu Phone: +1-301-454-4251 CSNet: steve@mimsy.umd.edu UUCP: {seismo,allegra}!mimsy!steve USPS: Computer Science Dept., University of Maryland, College Park, MD 20742
lantz@GREGORIO.STANFORD.EDU (Keith Lantz) (12/17/86)
Folks might also be interested to know that protocol development in Berkeley UNIX has been rather easy for years at CMU and Stanford, who jointly developed what is referred to as the "packet filter". A paper on the packet filter, by Jeff Mogul, Mike Accetta, and Rick Rashid was just presented at the Conference on Practical Software Development Environments. Perhaps the first thing to know is that it provides for application-level protocol development, rather than kernel hacking. For example, that's how our ``UNIX server'' for the V-System is implemented. We have been beating on Berkeley for several years to include same with the BSD distributions, with little success. Rumor has it that it IS included in the 4.3 distribution, but as unsupported software. I am not offering to support it myself, but if you're sufficiently interested and vocal enough, who knows who might respond... Keith Following is the man page for the 4.3 version of the packet filter. The 4.2 version differs somewhat. ENET(4) UNIX Programmer's Manual ENET(4) NAME enet - ethernet packet filter SYNOPSIS pseudo-device enetfilter 64 DESCRIPTION The packet filter provides a raw interface to Ethernets and similar network data link layers. Packets received that are not used by the kernel (i.e., to support IP, ARP, and on some systems XNS, protocols) are available through this mechanism. The packet filter appears as a set of character special files, one per hardware interface. Each enet file may be opened multiple times, allowing each interface to be used by many processes. The total number of open ethernet files is limited to the value given in the kernel configura- tion; the example given in the SYNOPSIS above sets the limit to 64. The minor device numbers are associated with interfaces when the system is booted. Minor device 0 is associated with the first Ethernet interface ``attached'', minor device 1 with the second, and so forth. (These character special files are, for historical reasons, given the names /dev/enet0, /dev/eneta0, /dev/enetb0, etc.) Associated with each open instance of an enet file is a user-settable packet filter which is used to deliver incom- ing ethernet packets to the appropriate process. Whenever a packet is received from the net, successive packet filters from the list of filters for all open enet files are applied to the packet. When a filter accepts the packet, it is placed on the packet input queue of the associated file. If no filters accept the packet, it is discarded. The format of a packet filter is described below. Reads from these files return the next packet from a queue of packets that have matched the filter. If insufficient buffer space to store the entire packet is specified in the read, the packet will be truncated and the trailing contents lost. Writes to these devices transmit packets on the net- work, with each write generating exactly one packet. The packet filter currently supports a variety of different ``Ethernet'' data-link levels: 3mb Ethernet packets consist of 4 or more bytes with the first byte specifying the source ethernet address, the second byte specifying the des- tination ethernet address, and the next two bytes specifying the packet type. (Actually, on the network the source and destination Printed 9/6/86 8 October 1985 1 ENET(4) UNIX Programmer's Manual ENET(4) addresses are in the opposite order.) byte-swapping 3mb Ethernet packets consist of 4 or more bytes with the first byte specifying the source ethernet address, the second byte specifying the des- tination ethernet address, and the next two bytes specifying the packet type. Each short word (pair of bytes) is swapped from the net- work byte order; this device type is only provided as a concession to backwards- compatibility. 10mb Ethernet packets consist of 14 or more bytes with the first six bytes specifying the destination ethernet address, the next six bytes the source ethernet address, and the next two bytes specifying the packet type. The remaining words are interpreted according to the packet type. Note that 16-bit and 32-bit quantities may have to be byteswapped (and possible short-swapped) to be intelligible on a Vax. The packet filter mechanism does not know anything about the data portion of the packets it sends and receives. The user must supply the headers for transmitted packets (although the system makes sure that the source address is correct) and the headers of received packets are delivered to the user. The packet filters treat the entire packet, including headers, as uninterpreted data. IOCTL CALLS In addition to FIONREAD, ten special ioctl calls may be applied to an open enet file. The first two set and fetch parameters for the file and are of the form: #include <sys/types.h> #include <sys/enet.h> ioctl(fildes, code, param) struct eniocb *param; where param is defined in <sys/enet.h> as: struct eniocb { u_char en_addr; u_char en_maxfilters; u_char en_maxwaiting; u_char en_maxpriority; long en_rtout; }; Printed 9/6/86 8 October 1985 2 ENET(4) UNIX Programmer's Manual ENET(4) with the applicable codes being: EIOCGETP Fetch the parameters for this file. EIOCSETP Set the parameters for this file. The maximum filter length parameter en_maxfilters indicates the maximum possible packet filter command list length (see EIOCSETF below). The maximum input wait queue size parame- ter en_maxwaitingindicates the maximum number of packets which may be queued for an ethernet file at one time (see EIOCSETW below). The maximum priority parameter en_maxpriority indicates the highest filter priority which may be set for the file (see EIOCSETF below). The en_addr field is no longer maintained by the driver; see EIOCDEVP below. The read timeout parameter en_rtout specifies the number of clock ticks to wait before timing out on a read request and returning an EOF. This parameter is initialized to zero by open(2), indicating no timeout. If it is negative, then read requests will return an EOF immediately if there are no packets in the input queue. (Note that all parameters except for the read timeout are read-only and are ignored when changed.) A different ioctl is used to get device parameters of the ethernet underlying the minor device. It is of the form: #include <sys/types.h> #include <sys/enet.h> ioctl(fildes, EIOCDEVP, param) where param is defined in <sys/enet.h> as: struct endevp { u_char end_dev_type; u_char end_addr_len; u_short end_hdr_len; u_short end_MTU; u_char end_addr[EN_MAX_ADDR_LEN]; u_char end_broadaddr[EN_MAX_ADDR_LEN]; }; The fields are: end_dev_type Specifies the device type; currently one of ENDT_3MB, ENDT_BS3MB or ENDT_10MB. end_addr_len Specifies the address length in bytes (e.g., Printed 9/6/86 8 October 1985 3 ENET(4) UNIX Programmer's Manual ENET(4) 1 or 6). end_hdr_len Specifies the total header length in bytes (e.g., 4 or 14). end_MTU Specifies the maximum packet size, including header, in bytes. end_addr The address of this interface; aligned so that the low order byte of the address is the first byte in the array. end_broadaddr The hardware destination address for broad- casts on this network. The next two calls enable and disable the input packet sig- nal mechanism for the file and are of the form: #include <sys/types.h> #include <sys/enet.h> ioctl(fildes, code, signp) u_int *signp; where signp is a pointer to a word containing the number of the signal to be sent when an input packet arrives and with the applicable codes being: EIOCENBS Enable the specified signal when an input packet is received for this file. If the ENHOLDSIG flag (see EIOCMBIS below) is not set, further signals are automatically disabled whenever a signal is sent to prevent nesting and hence must be specifically re- enabled after processing. When a signal number of 0 is supplied, this call is equivalent to EIOCINHS. EIOCINHS Disable any signal when an input packet is received for this file (the signp parameter is ignored). This is the default when the file is first opened. The next two calls set and clear ``mode bits'' for the for the file and are of the form: #include <sys/types.h> #include <sys/enet.h> ioctl(fildes, code, bits) u_short *bits; where bits is a short work bit-mask specifying which bits to set or clear. Currently, the only bit mask recognized is ENHOLDSIG, which (if clear) means that the driver should Printed 9/6/86 8 October 1985 4 ENET(4) UNIX Programmer's Manual ENET(4) disable the effect of EIOCENBS once it has delivered a sig- nal. Setting this bit means that you need use EIOCENBS only once. (For historical reasons, the default is that ENHOLD- SIG is set.) The applicable codes are: EIOCMBIS Sets the specified mode bits EIOCMBIC Clears the specified mode bits Another ioctl call is used to set the maximum size of the packet input queue for an open enet file. It is of the form: #include <sys/types.h> #include <sys/enet.h> ioctl(fildes, EIOCSETW, maxwaitingp) u_int *maxwaitingp; where maxwaitingp is a pointer to a word containing the input queue size to be set. If this is greater than maximum allowable size (see EIOCGETP above), it is set to the max- imum, and if it is zero, it is set to a default value. Another ioctl call flushes the queue of incoming packets. It is of the form: #include <sys/types.h> #include <sys/enet.h> ioctl(fildes, EIOCFLUSH, 0) The final ioctl call is used to set the packet filter for an open enet file. It is of the form: #include <sys/types.h> #include <sys/enet.h> ioctl(fildes, EIOCSETF, filter) struct enfilter *filter where enfilter is defined in <sys/enet.h> as: struct enfilter { u_char enf_Priority; u_char enf_FilterLen; u_short enf_Filter[ENMAXFILTERS]; }; A packet filter consists of a priority, the filter command list length (in shortwords), and the filter command list itself. Each filter command list specifies a sequence of Printed 9/6/86 8 October 1985 5 ENET(4) UNIX Programmer's Manual ENET(4) actions which operate on an internal stack. Each shortword of the command list specifies an action from the set { ENF_PUSHLIT, ENF_PUSHZERO, ENF_PUSHWORD+N } which respec- tively push the next shortword of the command list, zero, or shortword N of the incoming packet on the stack, and a binary operator from the set { ENF_EQ, ENF_NEQ, ENF_LT, ENF_LE, ENF_GT, ENF_GE, ENF_AND, ENF_OR, ENF_XOR } which then operates on the top two elements of the stack and replaces them with its result. When both an action and operator are specified in the same shortword, the action is performed followed by the operation. The binary operator can also be from the set { ENF_COR, ENF_CAND, ENF_CNOR, ENF_CNAND }. These are ``short- circuit'' operators, in that they terminate the execution of the filter immediately if the condition they are checking for is found, and continue otherwise. All pop two elements from the stack and compare them for equality; ENF_CAND returns false if the result is false; ENF_COR returns true if the result is true; ENF_CNAND returns true if the result is false; ENF_CNOR returns false if the result is true. Unlike the other binary operators, these four do not leave a result on the stack, even if they continue. The short-circuit operators should be used when possible, to reduce the amount of time spent evaluating filters. When they are used, you should also arrange the order of the tests so that the filter will succeed or fail as soon as possible; for example, checking the Socket field of a Pup packet is more likely to indicate failure than the packet type field. The special action ENF_NOPUSH and the special operator ENF_NOP can be used to only perform the binary operation or to only push a value on the stack. Since both are (con- veniently) defined to be zero, indicating only an action actually specifies the action followed by ENF_NOP, and indi- cating only an operation actually specifies ENF_NOPUSH fol- lowed by the operation. After executing the filter command list, a non-zero value (true) left on top of the stack (or an empty stack) causes the incoming packet to be accepted for the corresponding enet file and a zero value (false) causes the packet to be passed through the next packet filter. (If the filter exits as the result of a short-circuit operator, the top-of-stack value is ignored.) Specifying an undefined operation or action in the command list or performing an illegal opera- tion or action (such as pushing a shortword offset past the end of the packet or executing a binary operator with fewer than two shortwords on the stack) causes a filter to reject the packet. Printed 9/6/86 8 October 1985 6 ENET(4) UNIX Programmer's Manual ENET(4) In an attempt to deal with the problem of overlapping and/or conflicting packet filters, the filters for each open enet file are ordered by the driver according to their priority (lowest priority is 0, highest is 255). When processing incoming ethernet packets, filters are applied according to their priority (from highest to lowest) and for identical priority values according to their relative ``busyness'' (the filter that has previously matched the most packets is checked first) until one or more filters accept the packet or all filters reject it and it is discarded. Filters at a priority of 2 or higher are called "high prior- ity" filters. Once a packet is delivered to one of these "high priority" enet files, no further filters are examined, i.e. the packet is delivered only to the first enet file with a "high priority" filter which accepts the packet. A packet may be delivered to more than one filter with a priority below 2; this might be useful, for example, in building replicated programs. However, the use of low- priority filters imposes an additional cost on the system, as these filters each must be checked against all packets not accepted by a high-priority filter. The packet filter for an enet file is initialized with length 0 at priority 0 by open(2), and hence by default accepts all packets which no "high priority" filter is interested in. Priorities should be assigned so that, in general, the more packets a filter is expected to match, the higher its prior- ity. This will prevent a lot of needless checking of pack- ets against filters that aren't likely to match them. FILTER EXAMPLES The following filter would accept all incoming Pup packets on a 3mb ethernet with Pup types in the range 1-0100: struct enfilter f = { 10, 19, /* priority and length */ ENF_PUSHWORD+1, ENF_PUSHLIT, 2, ENF_EQ, /* packet type == PUP */ ENF_PUSHWORD+3, ENF_PUSHLIT, 0xFF00, ENF_AND, /* mask high byte */ ENF_PUSHZERO, ENF_GT, /* PupType > 0 */ ENF_PUSHWORD+3, ENF_PUSHLIT, 0xFF00, ENF_AND, /* mask high byte */ ENF_PUSHLIT, 0100, ENF_LE, /* PupType <= 0100 */ ENF_AND, /* 0 < PupType <= 0100 */ ENF_AND /* && packet type == PUP */ }; Printed 9/6/86 8 October 1985 7 ENET(4) UNIX Programmer's Manual ENET(4) Note that shortwords, such as the packet type field, are byte-swapped and so the literals you compare them to must be byte-swapped. Also, although for this example the word offsets are constants, code that must run with either 3mb or 10mb ethernets must use offsets that depend on the device type. By taking advantage of the ability to specify both an action and operation in each word of the command list, the filter could be abbreviated to: struct enfilter f = { 10, 14, /* priority and length */ ENF_PUSHWORD+1, ENF_PUSHLIT | ENF_EQ, 2, /* packet type == PUP */ ENF_PUSHWORD+3, ENF_PUSHLIT | ENF_AND, 0xFF00, /* mask high byte */ ENF_PUSHZERO | ENF_GT, /* PupType > 0 */ ENF_PUSHWORD+3, ENF_PUSHLIT | ENF_AND, 0xFF00, /* mask high byte */ ENF_PUSHLIT | ENF_LE, 0100, /* PupType <= 0100 */ ENF_AND, /* 0 < PupType <= 0100 */ ENF_AND /* && packet type == PUP */ }; A different example shows the use of "short-circuit" opera- tors to create a more efficient filter. This one accepts Pup packets (on a 3Mbit ethernet) with a Socket field of 12345. Note that we check the Socket field before the packet type field, since in most packets the Socket is not likely to match. struct enfilter f = { 10, 9, /* priority and length */ ENF_PUSHWORD+7, ENF_PUSHLIT | ENF_CAND, 0, /* High word of socket */ ENF_PUSHWORD+8, ENF_PUSHLIT | ENF_CAND, 12345, /* Low word of socket */ ENF_PUSHWORD+1, ENF_PUSHLIT | ENF_CAND, 2 /* packet type == Pup */ }; SEE ALSO de(4), ec(4), en(4), il(4), enstat(8) FILES /dev/enet{,a,b,c,...}0 BUGS The current implementation can only filter on words within the first "mbuf" of the packet; this is around 100 bytes (or Printed 9/6/86 8 October 1985 8 ENET(4) UNIX Programmer's Manual ENET(4) 50 words). Because packets are streams of bytes, yet the filters operate on short words, and standard network byte order is usually opposite from Vax byte order, the relational opera- tors ENF_LT, ENF_LE, ENF_GT, and ENF_GE are not all that useful. Fortunately, they were not often used when the packets were treated as streams of shorts, so this is prob- ably not a severe problem. If this becomes a severe prob- lem, a byte-swapping operator could be added. Many of the "features" of this driver are there for histori- cal reasons; the manual page could be a lot cleaner if these were left out. HISTORY 8-Oct-85 Jeffrey Mogul at Stanford University Revised to describe 4.3BSD version of driver. 18-Oct-84 Jeffrey Mogul at Stanford University Added short-circuit operators, changed discussion of priorities to reflect new arrangement. 18-Jan-84 Jeffrey Mogul at Stanford University Updated for 4.2BSD (device-independent) version, including documentation of all non-kernel ioctls. 17-Nov-81 Mike Accetta (mja) at Carnegie-Mellon University Added mention of <sys/types.h> to include examples. 29-Sep-81 Mike Accetta (mja) at Carnegie-Mellon University Changed to describe new EIOCSETW and EIOCFLUSH ioctl calls and the new multiple packet queuing features. 12-Nov-80 Mike Accetta (mja) at Carnegie-Mellon University Added description of signal mechanism for input pack- ets. 07-Mar-80 Mike Accetta (mja) at Carnegie-Mellon University Created. Printed 9/6/86 8 October 1985 9
Rudy.Nedved@H.CS.CMU.EDU.UUCP (12/19/86)
The ENET filter mechanism is nice but CMU is using the
BSD sockets mechanism. We still have a few applications
using the old mechanism but under a compatibility flag
and plan to flush the stuff.
>From the perspective of a network application hacker, the
only major thing that is missing from the BSD mechanism
is the ability to filter certain types of raw packets. It
should not be neccessary for one to modify the operating
system in order to see all packets of a certain type, or
length or from a certain host or going to a certain host.
The minor problems we have been ignoring. I don't like the
fact that the operating system code design seems to believe
it knows when data should be flushed for an SMTP connection
since it knows how to do it for a telnet and ftp data
connection. Heck, I like to get my performance anywhere I
can and dang...computers should be smart for the novice but
should not take control away from the expert...I hate waiting
for kernel bug fixes when a little more application level
control could create a work-around for the application.
-Rudy