robert@SPAM.ISTC.SRI.COM (Robert Allen) (12/16/86)
Pardon me for posting with a non-TCP/IP related subject, I have no good excuse... I'm wondering if anyone has attempted to develop other protocols on Sun computers using the "open-architecture". From initial inspection of the Sun document "Network Implementation" it appears that one can provide different protocol routines at various layers, and make use of the kernel hooks built into the system, thus provideing socket-type interfaces for protocols other than the currently supported TCP/IP and UDP (I knew I could make this letter pertinent). Specifically, I would like to know if; a) anyone has tried this with other protocols, and if so then which protocols, b) which layers are supported in this open architecture, and c) what problems were encountered if any. Any comments, questions, pointers, etc. are appreciated. Robert Allen robert@spam.istc.sri.com OR robert@sri-spam.ARPA
steve@BRILLIG.UMD.EDU.UUCP (12/16/86)
The Sun networking implementation is very close to being identical to
the standard 4.2BSD implementation. Unfortunately, that makes development
of other protocols (unless they live on top of IP, in which case it's
not bad to do) more troublesome than you might expect as, if memory
serves, the networking implementation manual (also lifted from a 4.2BSD
manual) is incorrect (or, perhaps, misleading) in terms of talking
about its protocol independence. There are all sorts of nasty AF_INET
dependencies lurking about in there, everywhere from the device drivers
to the network interface and routing code to NFS. It is possible to
track all these dependencies down -- Chris Torek and James O'Toole
did it here when they did their Xerox NS implementation for 4.2BSD --
but it probably won't be a whole lot of fun to do.
Back under Sun Unix 2.0 I hacked some XNS support into the kernel.
The way I did it was to remark, "gee, the interface between the network
code and the rest of the kernel isn't so bad" and stuff the whole of the
4.3BSD beta networking code into the kernel, throwing the Sun/4.2BSD code
out. Depending on what you're doing, that may be a win. I believe
that it was for me, as I didn't have to write a NS implementation if
I worked it that way. Furthermore, changing the INET-dependent code
is probably not particulary hard, but you'll have to muck with the
innards of almost every kernel module in /sys/net*, and that could
be both tedious and frustrating. Finally, the 4.3BSD networking
implementation is very much improved over the 4.2BSD one in the area
of TCP/IP, so you get a better TCP/IP in the bargain.
Oh yes, and of course it looks easier to stuff an entirely new
(non-INET, non-NS) protocol into 4.3BSD than it does into 4.2BSD.
There can't be too many dependencies still lurking about, 'cause
the NS support works.
Hope this is of use to you.
-Steve
Spoken: Steve Miller ARPA: steve@mimsy.umd.edu Phone: +1-301-454-4251
CSNet: steve@mimsy.umd.edu UUCP: {seismo,allegra}!mimsy!steve
USPS: Computer Science Dept., University of Maryland, College Park, MD 20742lantz@GREGORIO.STANFORD.EDU (Keith Lantz) (12/17/86)
Folks might also be interested to know that protocol development in
Berkeley UNIX has been rather easy for years at CMU and Stanford, who
jointly developed what is referred to as the "packet filter". A paper
on the packet filter, by Jeff Mogul, Mike Accetta, and Rick Rashid was
just presented at the Conference on Practical Software Development
Environments. Perhaps the first thing to know is that it provides for
application-level protocol development, rather than kernel hacking.
For example, that's how our ``UNIX server'' for the V-System is
implemented.
We have been beating on Berkeley for several years to include same with
the BSD distributions, with little success. Rumor has it that it IS
included in the 4.3 distribution, but as unsupported software. I am
not offering to support it myself, but if you're sufficiently
interested and vocal enough, who knows who might respond...
Keith
Following is the man page for the 4.3 version of the packet filter.
The 4.2 version differs somewhat.
ENET(4) UNIX Programmer's Manual ENET(4)
NAME
enet - ethernet packet filter
SYNOPSIS
pseudo-device enetfilter 64
DESCRIPTION
The packet filter provides a raw interface to Ethernets and
similar network data link layers. Packets received that are
not used by the kernel (i.e., to support IP, ARP, and on
some systems XNS, protocols) are available through this
mechanism. The packet filter appears as a set of character
special files, one per hardware interface. Each enet file
may be opened multiple times, allowing each interface to be
used by many processes. The total number of open ethernet
files is limited to the value given in the kernel configura-
tion; the example given in the SYNOPSIS above sets the limit
to 64.
The minor device numbers are associated with interfaces when
the system is booted. Minor device 0 is associated with the
first Ethernet interface ``attached'', minor device 1 with
the second, and so forth. (These character special files
are, for historical reasons, given the names /dev/enet0,
/dev/eneta0, /dev/enetb0, etc.)
Associated with each open instance of an enet file is a
user-settable packet filter which is used to deliver incom-
ing ethernet packets to the appropriate process. Whenever a
packet is received from the net, successive packet filters
from the list of filters for all open enet files are applied
to the packet. When a filter accepts the packet, it is
placed on the packet input queue of the associated file. If
no filters accept the packet, it is discarded. The format
of a packet filter is described below.
Reads from these files return the next packet from a queue
of packets that have matched the filter. If insufficient
buffer space to store the entire packet is specified in the
read, the packet will be truncated and the trailing contents
lost. Writes to these devices transmit packets on the net-
work, with each write generating exactly one packet.
The packet filter currently supports a variety of different
``Ethernet'' data-link levels:
3mb Ethernet packets consist of 4 or more bytes with the
first byte specifying the source ethernet
address, the second byte specifying the des-
tination ethernet address, and the next two
bytes specifying the packet type. (Actually,
on the network the source and destination
Printed 9/6/86 8 October 1985 1
ENET(4) UNIX Programmer's Manual ENET(4)
addresses are in the opposite order.)
byte-swapping 3mb Ethernet
packets consist of 4 or more bytes with the
first byte specifying the source ethernet
address, the second byte specifying the des-
tination ethernet address, and the next two
bytes specifying the packet type. Each short
word (pair of bytes) is swapped from the net-
work byte order; this device type is only
provided as a concession to backwards-
compatibility.
10mb Ethernet packets consist of 14 or more bytes with the
first six bytes specifying the destination
ethernet address, the next six bytes the
source ethernet address, and the next two
bytes specifying the packet type.
The remaining words are interpreted according to the packet
type. Note that 16-bit and 32-bit quantities may have to be
byteswapped (and possible short-swapped) to be intelligible
on a Vax.
The packet filter mechanism does not know anything about the
data portion of the packets it sends and receives. The user
must supply the headers for transmitted packets (although
the system makes sure that the source address is correct)
and the headers of received packets are delivered to the
user. The packet filters treat the entire packet, including
headers, as uninterpreted data.
IOCTL CALLS
In addition to FIONREAD, ten special ioctl calls may be
applied to an open enet file. The first two set and fetch
parameters for the file and are of the form:
#include <sys/types.h>
#include <sys/enet.h>
ioctl(fildes, code, param)
struct eniocb *param;
where param is defined in <sys/enet.h> as:
struct eniocb
{
u_char en_addr;
u_char en_maxfilters;
u_char en_maxwaiting;
u_char en_maxpriority;
long en_rtout;
};
Printed 9/6/86 8 October 1985 2
ENET(4) UNIX Programmer's Manual ENET(4)
with the applicable codes being:
EIOCGETP
Fetch the parameters for this file.
EIOCSETP
Set the parameters for this file.
The maximum filter length parameter en_maxfilters indicates
the maximum possible packet filter command list length (see
EIOCSETF below). The maximum input wait queue size parame-
ter en_maxwaitingindicates the maximum number of packets
which may be queued for an ethernet file at one time (see
EIOCSETW below). The maximum priority parameter
en_maxpriority indicates the highest filter priority which
may be set for the file (see EIOCSETF below). The en_addr
field is no longer maintained by the driver; see EIOCDEVP
below.
The read timeout parameter en_rtout specifies the number of
clock ticks to wait before timing out on a read request and
returning an EOF. This parameter is initialized to zero by
open(2), indicating no timeout. If it is negative, then read
requests will return an EOF immediately if there are no
packets in the input queue. (Note that all parameters
except for the read timeout are read-only and are ignored
when changed.)
A different ioctl is used to get device parameters of the
ethernet underlying the minor device. It is of the form:
#include <sys/types.h>
#include <sys/enet.h>
ioctl(fildes, EIOCDEVP, param)
where param is defined in <sys/enet.h> as:
struct endevp {
u_char end_dev_type;
u_char end_addr_len;
u_short end_hdr_len;
u_short end_MTU;
u_char end_addr[EN_MAX_ADDR_LEN];
u_char end_broadaddr[EN_MAX_ADDR_LEN];
};
The fields are:
end_dev_type Specifies the device type; currently one of
ENDT_3MB, ENDT_BS3MB or ENDT_10MB.
end_addr_len Specifies the address length in bytes (e.g.,
Printed 9/6/86 8 October 1985 3
ENET(4) UNIX Programmer's Manual ENET(4)
1 or 6).
end_hdr_len Specifies the total header length in bytes
(e.g., 4 or 14).
end_MTU Specifies the maximum packet size, including
header, in bytes.
end_addr The address of this interface; aligned so
that the low order byte of the address is the
first byte in the array.
end_broadaddr The hardware destination address for broad-
casts on this network.
The next two calls enable and disable the input packet sig-
nal mechanism for the file and are of the form:
#include <sys/types.h>
#include <sys/enet.h>
ioctl(fildes, code, signp)
u_int *signp;
where signp is a pointer to a word containing the number of
the signal to be sent when an input packet arrives and with
the applicable codes being:
EIOCENBS
Enable the specified signal when an input packet is
received for this file. If the ENHOLDSIG flag (see
EIOCMBIS below) is not set, further signals are
automatically disabled whenever a signal is sent to
prevent nesting and hence must be specifically re-
enabled after processing. When a signal number of 0 is
supplied, this call is equivalent to EIOCINHS.
EIOCINHS
Disable any signal when an input packet is received for
this file (the signp parameter is ignored). This is
the default when the file is first opened.
The next two calls set and clear ``mode bits'' for the for
the file and are of the form:
#include <sys/types.h>
#include <sys/enet.h>
ioctl(fildes, code, bits)
u_short *bits;
where bits is a short work bit-mask specifying which bits to
set or clear. Currently, the only bit mask recognized is
ENHOLDSIG, which (if clear) means that the driver should
Printed 9/6/86 8 October 1985 4
ENET(4) UNIX Programmer's Manual ENET(4)
disable the effect of EIOCENBS once it has delivered a sig-
nal. Setting this bit means that you need use EIOCENBS only
once. (For historical reasons, the default is that ENHOLD-
SIG is set.) The applicable codes are:
EIOCMBIS
Sets the specified mode bits
EIOCMBIC
Clears the specified mode bits
Another ioctl call is used to set the maximum size of the
packet input queue for an open enet file. It is of the
form:
#include <sys/types.h>
#include <sys/enet.h>
ioctl(fildes, EIOCSETW, maxwaitingp)
u_int *maxwaitingp;
where maxwaitingp is a pointer to a word containing the
input queue size to be set. If this is greater than maximum
allowable size (see EIOCGETP above), it is set to the max-
imum, and if it is zero, it is set to a default value.
Another ioctl call flushes the queue of incoming packets.
It is of the form:
#include <sys/types.h>
#include <sys/enet.h>
ioctl(fildes, EIOCFLUSH, 0)
The final ioctl call is used to set the packet filter for an
open enet file. It is of the form:
#include <sys/types.h>
#include <sys/enet.h>
ioctl(fildes, EIOCSETF, filter)
struct enfilter *filter
where enfilter is defined in <sys/enet.h> as:
struct enfilter
{
u_char enf_Priority;
u_char enf_FilterLen;
u_short enf_Filter[ENMAXFILTERS];
};
A packet filter consists of a priority, the filter command
list length (in shortwords), and the filter command list
itself. Each filter command list specifies a sequence of
Printed 9/6/86 8 October 1985 5
ENET(4) UNIX Programmer's Manual ENET(4)
actions which operate on an internal stack. Each shortword
of the command list specifies an action from the set {
ENF_PUSHLIT, ENF_PUSHZERO, ENF_PUSHWORD+N } which respec-
tively push the next shortword of the command list, zero, or
shortword N of the incoming packet on the stack, and a
binary operator from the set { ENF_EQ, ENF_NEQ, ENF_LT,
ENF_LE, ENF_GT, ENF_GE, ENF_AND, ENF_OR, ENF_XOR } which
then operates on the top two elements of the stack and
replaces them with its result. When both an action and
operator are specified in the same shortword, the action is
performed followed by the operation.
The binary operator can also be from the set { ENF_COR,
ENF_CAND, ENF_CNOR, ENF_CNAND }. These are ``short-
circuit'' operators, in that they terminate the execution of
the filter immediately if the condition they are checking
for is found, and continue otherwise. All pop two elements
from the stack and compare them for equality; ENF_CAND
returns false if the result is false; ENF_COR returns true
if the result is true; ENF_CNAND returns true if the result
is false; ENF_CNOR returns false if the result is true.
Unlike the other binary operators, these four do not leave a
result on the stack, even if they continue.
The short-circuit operators should be used when possible, to
reduce the amount of time spent evaluating filters. When
they are used, you should also arrange the order of the
tests so that the filter will succeed or fail as soon as
possible; for example, checking the Socket field of a Pup
packet is more likely to indicate failure than the packet
type field.
The special action ENF_NOPUSH and the special operator
ENF_NOP can be used to only perform the binary operation or
to only push a value on the stack. Since both are (con-
veniently) defined to be zero, indicating only an action
actually specifies the action followed by ENF_NOP, and indi-
cating only an operation actually specifies ENF_NOPUSH fol-
lowed by the operation.
After executing the filter command list, a non-zero value
(true) left on top of the stack (or an empty stack) causes
the incoming packet to be accepted for the corresponding
enet file and a zero value (false) causes the packet to be
passed through the next packet filter. (If the filter exits
as the result of a short-circuit operator, the top-of-stack
value is ignored.) Specifying an undefined operation or
action in the command list or performing an illegal opera-
tion or action (such as pushing a shortword offset past the
end of the packet or executing a binary operator with fewer
than two shortwords on the stack) causes a filter to reject
the packet.
Printed 9/6/86 8 October 1985 6
ENET(4) UNIX Programmer's Manual ENET(4)
In an attempt to deal with the problem of overlapping and/or
conflicting packet filters, the filters for each open enet
file are ordered by the driver according to their priority
(lowest priority is 0, highest is 255). When processing
incoming ethernet packets, filters are applied according to
their priority (from highest to lowest) and for identical
priority values according to their relative ``busyness''
(the filter that has previously matched the most packets is
checked first) until one or more filters accept the packet
or all filters reject it and it is discarded.
Filters at a priority of 2 or higher are called "high prior-
ity" filters. Once a packet is delivered to one of these
"high priority" enet files, no further filters are examined,
i.e. the packet is delivered only to the first enet file
with a "high priority" filter which accepts the packet. A
packet may be delivered to more than one filter with a
priority below 2; this might be useful, for example, in
building replicated programs. However, the use of low-
priority filters imposes an additional cost on the system,
as these filters each must be checked against all packets
not accepted by a high-priority filter.
The packet filter for an enet file is initialized with
length 0 at priority 0 by open(2), and hence by default
accepts all packets which no "high priority" filter is
interested in.
Priorities should be assigned so that, in general, the more
packets a filter is expected to match, the higher its prior-
ity. This will prevent a lot of needless checking of pack-
ets against filters that aren't likely to match them.
FILTER EXAMPLES
The following filter would accept all incoming Pup packets
on a 3mb ethernet with Pup types in the range 1-0100:
struct enfilter f =
{
10, 19, /* priority and length */
ENF_PUSHWORD+1, ENF_PUSHLIT, 2,
ENF_EQ, /* packet type == PUP */
ENF_PUSHWORD+3, ENF_PUSHLIT,
0xFF00, ENF_AND, /* mask high byte */
ENF_PUSHZERO, ENF_GT, /* PupType > 0 */
ENF_PUSHWORD+3, ENF_PUSHLIT,
0xFF00, ENF_AND, /* mask high byte */
ENF_PUSHLIT, 0100, ENF_LE, /* PupType <= 0100 */
ENF_AND, /* 0 < PupType <= 0100 */
ENF_AND /* && packet type == PUP */
};
Printed 9/6/86 8 October 1985 7
ENET(4) UNIX Programmer's Manual ENET(4)
Note that shortwords, such as the packet type field, are
byte-swapped and so the literals you compare them to must be
byte-swapped. Also, although for this example the word
offsets are constants, code that must run with either 3mb or
10mb ethernets must use offsets that depend on the device
type.
By taking advantage of the ability to specify both an action
and operation in each word of the command list, the filter
could be abbreviated to:
struct enfilter f =
{
10, 14, /* priority and length */
ENF_PUSHWORD+1, ENF_PUSHLIT | ENF_EQ, 2, /* packet type == PUP */
ENF_PUSHWORD+3, ENF_PUSHLIT | ENF_AND,
0xFF00, /* mask high byte */
ENF_PUSHZERO | ENF_GT, /* PupType > 0 */
ENF_PUSHWORD+3, ENF_PUSHLIT | ENF_AND,
0xFF00, /* mask high byte */
ENF_PUSHLIT | ENF_LE, 0100, /* PupType <= 0100 */
ENF_AND, /* 0 < PupType <= 0100 */
ENF_AND /* && packet type == PUP */
};
A different example shows the use of "short-circuit" opera-
tors to create a more efficient filter. This one accepts
Pup packets (on a 3Mbit ethernet) with a Socket field of
12345. Note that we check the Socket field before the
packet type field, since in most packets the Socket is not
likely to match.
struct enfilter f =
{
10, 9, /* priority and length */
ENF_PUSHWORD+7, ENF_PUSHLIT | ENF_CAND,
0, /* High word of socket */
ENF_PUSHWORD+8, ENF_PUSHLIT | ENF_CAND,
12345, /* Low word of socket */
ENF_PUSHWORD+1, ENF_PUSHLIT | ENF_CAND,
2 /* packet type == Pup */
};
SEE ALSO
de(4), ec(4), en(4), il(4), enstat(8)
FILES
/dev/enet{,a,b,c,...}0
BUGS
The current implementation can only filter on words within
the first "mbuf" of the packet; this is around 100 bytes (or
Printed 9/6/86 8 October 1985 8
ENET(4) UNIX Programmer's Manual ENET(4)
50 words).
Because packets are streams of bytes, yet the filters
operate on short words, and standard network byte order is
usually opposite from Vax byte order, the relational opera-
tors ENF_LT, ENF_LE, ENF_GT, and ENF_GE are not all that
useful. Fortunately, they were not often used when the
packets were treated as streams of shorts, so this is prob-
ably not a severe problem. If this becomes a severe prob-
lem, a byte-swapping operator could be added.
Many of the "features" of this driver are there for histori-
cal reasons; the manual page could be a lot cleaner if these
were left out.
HISTORY
8-Oct-85 Jeffrey Mogul at Stanford University
Revised to describe 4.3BSD version of driver.
18-Oct-84 Jeffrey Mogul at Stanford University
Added short-circuit operators, changed discussion of
priorities to reflect new arrangement.
18-Jan-84 Jeffrey Mogul at Stanford University
Updated for 4.2BSD (device-independent) version,
including documentation of all non-kernel ioctls.
17-Nov-81 Mike Accetta (mja) at Carnegie-Mellon University
Added mention of <sys/types.h> to include examples.
29-Sep-81 Mike Accetta (mja) at Carnegie-Mellon University
Changed to describe new EIOCSETW and EIOCFLUSH ioctl
calls and the new multiple packet queuing features.
12-Nov-80 Mike Accetta (mja) at Carnegie-Mellon University
Added description of signal mechanism for input pack-
ets.
07-Mar-80 Mike Accetta (mja) at Carnegie-Mellon University
Created.
Printed 9/6/86 8 October 1985 9Rudy.Nedved@H.CS.CMU.EDU.UUCP (12/19/86)
The ENET filter mechanism is nice but CMU is using the
BSD sockets mechanism. We still have a few applications
using the old mechanism but under a compatibility flag
and plan to flush the stuff.
>From the perspective of a network application hacker, the
only major thing that is missing from the BSD mechanism
is the ability to filter certain types of raw packets. It
should not be neccessary for one to modify the operating
system in order to see all packets of a certain type, or
length or from a certain host or going to a certain host.
The minor problems we have been ignoring. I don't like the
fact that the operating system code design seems to believe
it knows when data should be flushed for an SMTP connection
since it knows how to do it for a telnet and ftp data
connection. Heck, I like to get my performance anywhere I
can and dang...computers should be smart for the novice but
should not take control away from the expert...I hate waiting
for kernel bug fixes when a little more application level
control could create a work-around for the application.
-Rudy