[comp.os.minix] V1.3 posting #33 - networking documentation

ast@cs.vu.nl (Andy Tanenbaum) (07/14/88)

                        MINIX NETWORKING

1. INTRODUCTION
     Network software can be divided into two general categories differing
in the way the software is integrated into the operating system and the user
software.  When networks first developed, they were used over slow wide-area
links (56 kbps or less), so the designers' main concern was using the 
available bandwidth efficiently.  Programmer convenience was not considered.  
Later, as higher bandwidth networks became widespread (especially local area 
networks, such as Ethernet), the focus changed from worrying about bandwidth 
utilization, to worrying about making the network interface convenient for 
the programmers.  This evolution is very similar to the evolution from 
assembly language programming, where the machine came first, to programming 
in high level languages, where the programmer came first.

     Networks of the first type of are said to be connection oriented, and use
what are called sliding window protocols.  All older networks, especially
wide area networks, are of this type.  Some of the better known protocols are
X.25, TCP/IP, and OSI.  Networks of the second type are connectionless, and
use what is called remote procedure call (RPC).  Virtually all modern 
distributed operating systems are based on this concept.  Some well-known 
examples are the work of Xerox PARC [1], the V kernel [2], and Amoeba [3-9].
While it is certainly possible to build RPC on top of a connection-oriented
protocol, this approach is inefficient compared to building the RPC on top of
the bare network.  For an introduction to connection-oriented protocols, RPC,
and networking in general, see [10].

     Networking in MINIX is based on RPC.  Briefly summarized, communication
between two processes works as follows.  One of the processes, called the
server, has some service to offer, such as a file storage.  The other process,
the client, wants to use this service.  The interface to the service consists
of a collection of procedures that the client can call.  In the case of a file
server, the procedures might be CREATE_FILE, RENAME_FILE, READ_DATA,
WRITE_DATA, and so on.  These are library routines available on the client's
machine.

     When the client calls one of these procedures, the procedure sends a
message to the server containing the procedure name and its parameters.  The
procedure then blocks waiting for the reply.  When the message gets to the
server, it is decoded there and executed.  The reply is sent back to the
calling procedure on the client's machine, which then returns the results to
the caller.  From the programmer's point of view, having remote services in the
network essentially means that there is a new collection of procedures to 
call.  The programmer is not burdened with concepts like opening connections,
sending data, or thinking in terms of acknowledgements, all of which are
needed in the connection-oriented model.  Nor is the network software burdened
with having to manage connections.  

     In effect, RPC is based on the abstraction of the procedure call, whereas
connection-oriented networks are based on the much lower-level concept of 
making the network look like an input/output device.  While at first glance it 
might seem that connection-oriented networking could be made to fit with the
UNIX/MINIX concept of a pipe, pipes are set up in a very different way (by a
common ancestor), and fit very poorly to the most common style of local area
network programming, where the client has a request and the server gives a 
response.  With wide area networks, this kind of interaction is painfully slow,
due to the low bandwidth, so the only services generally available are mail
and file transfer, which are batch-oriented.  MINIX networking has been
designed for interactive use on high performance local area networks, so for
this reason, RPC has been chosen over the older connection-oriented style.

     In particular, MINIX networking has been designed to be compatible with
the form of RPC used in the Amoeba distributed operating system [3-9].  Not
only have the concepts and the implementation been well tested, but the
performance is exceedingly good.  For example, for doing file transfers,
something that connection-oriented protocols are supposed to be good at,
Amoeba running on two Sun 3s achieves triple the throughput of TCP/IP running
on the same hardware.  Data transfers between two Zenith Z-248s running
the Amoeba RPC on MINIX have been measured at 165 kbytes/sec, almost as fast
as TCP/IP transfers between two Sun 3/50s.  Considering that the Suns are
two times as fast as the Z-248s and the network software is 100% CPU
limited (doubling the CPU speed doubles the throughput), this is a strong
argument for the Amoeba RPC.  As a final statistic, the RPC throughput between
a client and server located on the same Z-248 is 1.5 Mbytes/sec, an extremely
high figure for this class of machine, and much better than what Suns and VAXes
normally achieve locally, despite their greater CPU power.  In conclusion,
although RPC was chosen for its elegance and ease of use, it turns out that it
also has excellent performance, even doing things like bulk transfer, and
certainly doing things like short request-reply interactions.

     A few words about Amoeba are probably in order here.  It is a distributed 
operating system that was developed at the Vrije Universiteit in Amsterdam and
is now being used there, at the Centre for Mathematics and Computer Science in
Amsterdam, and a number of other research centers in several countries. It is 
written in C and currently runs on a wide variety of microprocessors and 
minicomputers (including the Sun 3, various other MC68020 systems, the PDP 11 
and the DEC Vax).  Note that Amoeba is a complete operating system, just like 
UNIX, MINIX or VMS.  The only relation between Amoeba and MINIX is that MINIX 
networking uses the Amoeba RPC protocols.  Other than that they are quite 
different in structure, funtionality, and goals. Amoeba was designed to run 
on systems consisting of dozens of processors, and yet give the programmer the
illusion that it is a traditional single-CPU time sharing system.  For more
information about Amoeba, see the references.


2. OBJECTS
     Amoeba is an object-oriented system, and to a considerable extent this
orientation is reflected in the protocol.  As a consequence, MINIX also
acquires a certain object-orientation.  Very briefly, an object is a programmer
defined abstract data type that has well-defined operations on it.  As an
example, a file server could define file and directory objects, and provide
operations to read and write the file objects, and insert files in, and delete
files from, directory objects.  Clients can perform these operations by doing 
RPCs with the file server.  Henceforth we will adopt the Amoeba terminology and
call these RPCs "transactions."  A transaction consists of a request message
from a client to a server, followed by a reply message from the server back to
the client.

     It is up to the writer of each server to decide what kinds of objects the
server will support and what operations will be available on them.  The 
structure of the system guarantees that clients can only perform the 
operations provided by the server.  This style of networking is intended to 
force constraints on programmers, just as high-level languages force 
constraints on former assembly-language programmers.

     Objects are normally protected by capabilities, which are currently 128-
bit numbers, although in the the next version of Amoeba (Amoeba 4.0) this will
become 256 bits.  When a client asks a server to create an object, the server
returns a capability for the object.  This capability must be presented by the
client to perform subsequent operations on the object.  In Amoeba, capabilities
are protected crytographically.  Since the MINIX kernel, unlike the Amoeba
kernel, was not designed from scratch as a distributed system, the protection
aspects in MINIX are not fully implemented.

     A capability has 4 fields, described below.  These fields are important
because they appear in the Amoeba and MINIX message headers.

	Port:	48-bit number used to identify the server owning the object.
	Object:	24-bit number used by the server to identify the object
	Rights:	8 bits telling which operations are allowed
	Cksum:	48-bit checksum to prevent tampering with the capability

The "port" field is a (random) 48-bit number used for addressing.  Any 48-bit
number can be used as a port.  In some situations, an ASCII string can be used
as a port, with the first 48 bits taken as the port number.  All messages
in Amoeba and MINIX are sent to ports, not to machine addresses.  The mapping
of ports to machine addresses is done deep down in the system, and is of little
concern to the average programmer.  Thus: a port uniquely identifies a server
and provides a logical address to which all messages for the server are sent.

     The remaining three fields are called the private part of the capability.
In theory, each server can use them any way it wants to.  In practice, to
prevent total chaos, all existing servers adhere to the following conventions
(just as most UNIX programs adhere to the convention that certain files contain
ASCII characters with a line feed at the end of each line).  The "object" field
is used by the server to identify the specific object being accessed.  For
example, when a file server created a new file on behalf of a client, it could
put the i-node number of the new file in this field, so that when the client
later used the capability, the server could tell which file was being 
addressed.  The field is 24-bits long, providing each server with 16 million
object identifers.

     The "rights" field contains a bit map for up to eight protected
operations.  Each bit controls permission to perform one operation.  Thus a
file server could allocate bit 0 for READ_DATA, bit 1 for WRITE_DATA, bit 2 for
APPEND_DATA, bit 3 for DELETE_FILE, and so on.  When a capability arrives from
a client, the server checks to see if the bit corresponding to the relevant 
operation is on.  If it is not, the operation is rejected.  In this way, a user
can create a file, ask the server to turn off the WRITE_DATA and DELETE_FILE
bits, and then give the capability to another user.  This new user cannot
perform WRITE_DATA and DELETE_FILE operations, but can perform the operations
whose bits are turned on.

     A moment's thought will reveal that the above protection scheme is
worthless if users can turn the rights bits on and off by themselves.  To
prevent this, the "cksum" field is used.  When creating a new object, the
server simultaneously creates a random number and stores it in its internal
tables (e.g., in the i-node).  It then combines the rights bits and the
random number, and passes the result through a one-way cryptographic function.
The result of this function is put in the cksum field.  When a capability
comes in from a client, the server uses the object number to locate the
original random number.  It then combines it with the rights bits present in
the capability, and runs the result through the one-way function.  If the
result disagrees with the cksum field, the capability is considered invalid, 
and an error return is sent back.  In this way, users who change the rights 
bits will simply invalidate their capabilities.  Attempts to break the scheme 
by finding an inverse to the one-way function can be handled by choosing a
cryptographically strong one-way function.  Brute force does not work either,
as picking cksums at random will require, on the average, 2**47 attempts to
guess the 48-bit cksum.  Since a null transaction over a 10 Mbit/sec Ethernet 
using SUN 3/50s takes about 1.4 msec, about 3000 years are needed to perform 
the search.  Furthermore, it is easy enough to program a server to artificially
increase the transaction time to 1 sec after 10 unsuccessful attempts have
been made, thus increasing the mean search time to 3,000,000 years.


3. OVERVIEW OF TRANSACTIONS
    To summarize what we have covered so far, the normal style of networking in
MINIX (and Amoeba) is to structure dialogues in terms of clients and servers.
Each server manages one or more types of objects, and provides operations for
clients to perform operations on these objects.  When a client asks a server to
create an object for it, the server then returns a capability for the object
to the client.  This capability identifies the server, identifies the object,
and tells which subset of the operations the holder of the capability may
perform.  To have an operation performed, the client sends a request message
to the server (with the capability embedded in the message header), and the
server then sends back a reply.  In most cases, the calls to the server are
embedded in library procedures, called "stubs", to encapsulate the message 
passing and hide it from the users.  

      Transactions provide a basis for a large number of user services.
In MINIX, users can use them to build arbitrary services.  Two key services
are provided as standard for MINIX, remote execution and remote file copying.
These services make use of a process called the shell server, or sherver for
short.  The sherver accepts messages from remote (or local) clients, executes
the commands in them, and returns the output.

     Communication is implemented as follows. Each server listens to a
unique 48-bit port.  A client that wants service from the server sends a 
request to that port and blocks until it receives a reply. (If the client 
cannot find anyone listening to the port after a given period, it times
out and returns an error status.)  When the server is ready, it returns a reply
to the client,  which then continues execution. Each transaction is independent
of the previous transactions; there is no connection or virtual circuit.


     Clients must have some way of discovering a server's port.  Under Amoeba 
a directory server is used. The directory server stores capabilities for 
objects and associates them with an ASCII string.  The directory server has a 
well known port.  Under MINIX you make initial contact with a sherver that has
a well known port and then the sherver creates a secret port for all further 
transactions on that machine.

    There are four stub routines in the user library which provide the basic
interface between user processes and transactions.  They are:

	1. getreq() - get request (used by servers to get a request)
	2. putrep() - put reply (used by servers to send reply)
	3. trans() - transaction (used by clients to do a transaction)
	4. timeout() - sets the time limit at which trans() gives up

Getreq() and putrep() are used by servers to get a request from a client and
to send a reply.  A server may not do a getreq() until it has replied to the 
previous getreq().  The call trans() is used by clients to send a request to a
server. It blocks until a reply or a signal arrives, or, if it cannot find a 
server listening to his port, it times out and returns an error code.
The length of the timeout is set using the function timeout().  This timeout
has to do with locating servers, not how long they have to do the work.

     Messages of up to 30000 bytes can be sent between client and server.
This limit will increase to 1 Gbyte in the next version of Amoeba but will
probably remain at 30000 bytes in MINIX due to the small address space of the
IBM PC.  It is possible to provide security so that servers only execute 
remote procedure calls for authorized users. The protection mechanism uses 
capabilities and is discussed in detail in the references.  It will not be
discussed much here.  This protection mechanism is not implemented in the 
remote shell software available with MINIX.  (It requires a directory server, 
among other things. The implementation is left as an exercise for the reader.)


4. SYNTAX AND SEMANTICS OF TRANSACTION PRIMITIVES
     Now we will take a detailed look at the syntax and semantics of the 
library routines for using transactions, followed by some simple examples to 
indicate how the functions are typically used. Remember, that when programming
with transactions, the primitives used in C programs are getreq(), putrep(),
trans(), and timeout().  These can be thought of as "network system calls,"
although they are not implemented quite like that in MINIX.  If you are
building a server, it will typically have a main loop with a getreq() at the
top, a switch in the middle based on some field of the incoming message, and
a putrep() at the bottom.  Furthermore, the server writer will generally also
provide a set of stub procedures that contain trans() calls to access the
server.  The average user will call these library procedures, and will not
make trans() calls directly, although he is, of course, free to do so if he 
wishes.

     Transaction messages always begin with a special header.  The exact
layout of these messages is defined by the Amoeba protocol.  By using this
protocol, MINIX machines can communicate with one another, and with Suns, 
Vaxes, and PDP-11s running Amoeba.  Device drivers have also been written for
UNIX to allow UNIX processes to speak Amoeba, and have Amoeba clients and
servers run on UNIX.  At the Vrije Universiteit, all the Suns, Vaxes, and
PDP-11s that run UNIX have such drivers to communicate with each other and with
machines running Amoeba and MINIX.  It is the local lingua franca, just as
TCP/IP is at some sites.

     The amoeba header is defined in the include file "amoeba.h", which must
be included in all programs using transactions.  The header definition is given
below.  The types used in the header struct are also defined in "amoeba.h".

typedef struct {
	port	h_port;		/* port (i.e., logical address) of the dest. */
	port	h_signature;	/* used for authentication and protection */
	private	h_priv;		/* 10 bytes: object, rights, and cksum */
	unshort h_command;	/* code for operation desired/status returned*/
	long	h_offset;	/* parameter field */
	unshort	h_size;		/* parameter field */
	unshort	h_extra;	/* parameter field */
} header;

     The message header contains the port to which the message should be sent, 
a command/status field for use by the server and space for some parameters to 
go with the command or status.  Let us now look at the four network primitives.
The first one, getreq, has the following declaration:

	unshort getreq(hdr, buffer, size)
	header *hdr;
	char *buffer;
	unshort	size;

The three parameters refer to the header, the buffer, and the buffer size,
respectively.  In a sense, they are analogous to the parameters of the MINIX
READ and WRITE system calls.  The hdr parameter points to a header struct,
which is used to allow the server to specify which port it wants to listen to.
The h_port field of the header must be initialized with the port number.
The buffer parameter is a pointer to a buffer to hold the incoming message.
It can hold a maximum of size bytes, specified by the third parameter.  If 
successful getreq() returns the number of the bytes of data in the buffer that
were actually received.  In addition, the other fields of the header are filled
in by the system.  If an error occurs then it returns a negative error code. 
Possible error codes (defined in "amoeba.h") are: 

	FAILED:	a null port was given or a getreq was attempted before 
		the previous putrep() was done
	BADADDRESS: the buffer pointer and/or size was not valid
	ABORTED: a signal was received
	TRYAGAIN: there were no free transaction slots in the kernel tables

Note that after a getreq(), trans() may be used to communicate with another
server before doing the putrep(). In other words, a server may call other
servers to help it do its job, but it may not process multiple transactions
simultaneously.  (In Amoeba, server processes may contain multiple threads
to allow parallelism, but MINIX does not allow multiple threads per process.)

     The next call is putrep(), used by servers to reply to requests and send
back results and status information.  The declaration is:

	unshort putrep(hdr, buffer, size)
	header *hdr;
	char *buffer;
	unshort	size;

The header returned contains status information, and possibly a new port
(in the h_signature field).  A buffer containing size bytes of data is also
returned to the client.  If successful, putrep() returns the number of bytes 
sent.  The reply message is not acknowledged, so that a successful return
from this call does not guarantee that the client got the reply.  In general,
it is up to the client to try again if the reply is not forthcoming quickly
enough.  Possible error conditions for putrep() are defined in "amoeba.h" as
follows:

	FAILED: no getreq() was done first
	BADADDRESS: the buffer pointer and/or size was not valid
	ABORTED: a signal was received

     Now we come to the call used by clients to request services and wait for
replies.  Servers can also use this call to request services from other 
servers.  Thus at one instant a process may be acting as a server and at 
another the same process may be acting as a client.  The client call is:

	unshort trans(hdr1, buffer1, size1, hdr2, buffer2, size2)
	header *hdr1, *hdr2;
	char *buffer1, *buffer2;
	unshort	size1, size2;

The call has two independent sets of parameters.  Those with suffix 1 are
used for sending the request message to the server.  Those with suffix 2 are
used for getting the reply.  Both sets have a header, a buffer, and a size.
The two hdr pointers point to structs for message headers.  The first one
contains parameters copied to the outgoing message to the server and the
second one contains space for the data to be copied in from the server's
putrep().  The two buffer parameters are for the outgoing and incoming data,
respectively, and the two sizes tell how large these buffers are.

     After making a trans() call, the client blocks until the message has
been sent, received, processed by the server, and replied to.  Only then can
the client continue execution.  At this point the fields of hdr2 and buffer2
will contain the reply data.  Like MINIX itself, transactions support only this
synchronous form of communication.  Experience has painfully shown that 
asynchronous stream communication is difficult for programmers to deal with.
After all, everything else in programming languages is synchronous.  (Can you
imagine what it would be like to have a procedure call return control to the
caller before having finished its work?)

     If successful, trans(), returns the number of bytes in the reply.  
Possible error codes are: 

	FAILED: a null port was given or the server crashed between doing the 
		getreq and the putrep
	NOTFOUND: the port locate failed to find a server before the timeout
	BADADDRESS: a buffer pointer and/or size was not valid
	ABORTED: a signal was received
	TRYAGAIN: there were no free transaction slots in the kernel's tables

The final network primitive deals with setting timeouts.  When a client first
does a transaction on a previously unknown port, the kernel broadcasts a locate
message to find the server.  It then waits a certain amount of time for a 
server to reply.  If no server replies before the timer goes off, the trans()
fails with NOTFOUND.  The timeout() call allows the client to determine how
long to wait for a server to reply.  After a reply has been received, the
kernel keeps it in a cache, so that locates will not be needed subsequently.
It is important to realize that the timeout relates to locating servers, not
to how much time servers have to perform their work.  The declaration is:

	unshort timeout(time)
	unshort	time;

The function sets the length of the locate timeout in tenths of a second.
The default is 300 (30 seconds). A timeout of 0 means do not time out.
The timeout() call returns the length of the previous timeout. 


5. STRUCTURE OF SERVERS AND CLIENTS
     In this section we will examine typical servers and clients to give an
idea of how they are structured.

5.1 Server structure
     A typical server has the following form:

	/* Declarations needed by the server. */
	header hdr;			/* header for receiving requests */
	char buffer[BUFSIZE];		/* buffer for receiving requests */
	char reply[BUF2SIZE];		/* buffer for sending replies */
	unshort	size, replysize;	/* sizes of the two buffers */
	unshort	getreq();		/* function declaration */
	char *strncpy();		/* string function */

	signal(SIGAMOEBA, SIG_IGN);	/* ignore signals */

	while (1) {

	  /* Have the server listen to a 48-bit port equal to ASCII "MyServ" */
	  strncpy(&hdr.h_port, "MyServ", HEADERSIZE);

	  /* Wait for a request to come in for that port. */
	  size = getreq(&hdr, buffer, BUFSIZE);

	  /* If the size returned is negative then an error occurred. */
	  if ((short) size < 0) {
		handle_error();
	  } else {
		perform_request_found_in_buffer(); /* carry out the work */
		hdr.h_status = OK; 		   /* or whatever */
		putrep(&hdr, reply, replysize);	   /* send reply back */
	  }
	}

If all the information necessary for the request is in the headers then the 
buffers in getreq() and putrep() can be replaced by the value NILBUF and the 
buffer sizes can be replaced by 0.

5.2 Clients Structure
     The structure of a client program is much more variable.  A program that 
deals with the above server might look like this: 

	/* Declarations needed by the client. */
	header	hdr;			/* header used for request */
	char buffer[BUFSIZE];		/* buffer used for request */
	short size;			/* size of the buffer */
	unshort	trans();		/* function declaration */
	char *strncpy();		/* string function */

	/* Initialize server port to "MyServ". */
	strncpy(&hdr.h_port, "MyServ", HEADERSIZE);

	/* Send request to server listening to that port. */
	size = (short) trans(&hdr, buffer, BUFSIZE, &hdr, NILBUF, 0);
	if (size < 0) {
		printf("trans failed %d\n", size);
	} else {
		if (hdr.h_status != OK)	/* nonzero status is an error */
			work_not_done();
		else
			successful_trans();
	}

5.3 Signal Handling
     The semantics of signals with transactions is important for programmers to
understand.  If a client receives a signal while doing a trans(), the signal 
propagates to the server.  If the server is also doing a trans() then it 
propagates again to the next server, and so on.  The aim of this is to request
all servers to terminate their transaction as soon as possible.
 
     If the server receiving the signal is not doing a transaction and not 
already doing a putrep() then the server code must handle the signal.  It may 
choose to catch the signal and send a reply immediately or simply ignore the 
signal.  If it does not catch the signal then it will die since the signal 
propagated is SIGAMOEBA (which is defined as SIGEMT for MINIX).  In this case 
the transaction will fail (with return status FAILED for the client).

     Once the transaction is completed the client process will be signaled.
It in turn must handle the original signal (not necessarily SIGAMOEBA).  The 
exact transaction semantics of Amoeba are not supported under MINIX due to
difficulty in keeping user processes alive until a transaction terminates
after a signal.  Signal propagation does occur, but the client may die before 
a reply comes in.  This should not matter too much for most applications. 
In the next rewrite of Amoeba the syntax and semantics of these functions will
change in non-compatible ways, but this will probably not appear in MINIX.


6. IMPLEMENTATION OF TRANSACTIONS IN MINIX
     Amoeba transactions are implemented in the MINIX kernel as a number of 
kernel tasks. Several alterations were made to the kernel to support these 
tasks, including the addition of an (optional) ethernet driver (for the 
Western Digital EtherCard Plus (TM), also known as the WD1003E) and the 
possibility to specify the size of the stack for kernel tasks on a per task 
basis.  (Amoeba tasks need larger stacks than the other MINIX kernel tasks.)  
There is also an extra system call that is handled by MM.  This is the Amoeba 
system call and is the interface to the kernel.  Special handling of signals is
also provided for in the MM task. 

     There are five kernel tasks for Amoeba.  The first acts as a manager 
which accepts asynchronous events.  Possible events are:

	1. An ethernet packet has arrived
	2. A local signal has arrived
	3. A user task involved in an active transaction has died
	4. A sweep timeout has occurred 

(Locate timeouts are implemented using a counter which is decremented every
tenth of a second by a sweep routine.) Each of the other four tasks manage a 
single user process' transactions.  Thus, a maximum of four processes can
simultaneously do transactions under MINIX.  The number of transaction tasks 
is, however, a constant in an include file and can be increased if needed.

     In the MINIX kernel there is a table which keeps a record of the current 
state of a transaction.  This table is called "am_task" and is declared in the
file "amoeba.c."  This records many things, including, the process number of 
the task doing the transaction, the current state (locating, waiting for a 
reply, waiting for a request, etc.) and the relevant ports and machine 
addresses. 

     The Amoeba network protocol is a stop and wait protocol that guarantees
at most once delivery of a message.  A message consists of the concatenation of
the transaction header with the data in the buffer (if any) given to trans(),
getreq() or putrep().  The transaction code divides messages up into packets 
which fit on the underlying network medium (which is ethernet in the case of 
MINIX).  It then sends over the message fragments and they are reassembled on 
the remote machine before being given to the recipient.

     Each packet begins with an ethernet header (which consists of the source 
and destination ethernet addresses) followed by a 10-byte Amoeba internet 
header containing data about the source and destination processes to ensure
that the message is delivered to the correct process.  The rest of the packet 
is used for sending data.


7. COMPILING THE SYSTEM 
     There are several interesting things you need to know before you can build
a MINIX kernel with Amoeba transactions in it.  First of all, you do not need 
an Ethernet to use transactions.  You can have your clients and servers running
on a single machine.  In this mode, it is possible to write and debug network
software without having a network.  Later, when you move to a real network, the
code will already be fully debugged, as the system itself makes no distinction
between local and remote transactions.  

     Second, the transaction code is quite substantial.  So much so that it
would tend to overshadow the rest of MINIX if it were fully integrated into it.
This fact, combined with the knowledge that not all MINIX users are interested
in networking has led to adding a new top-level directory in MINIX, amoeba.
This directory and its subdirectories contain all the networking code.  If you
are not interested in networking, copy the entire top-level amoeba directory to
a diskette in case you later become interested, and then type

	rm -rf amoeba

This will get rid of all the networking code, and you can continue as usual.
The only thing you will notice is a few #ifdefs in the normal code that relate
to networking; they will all be disabled if you do not specifically enable 
them. 

     Installation of networking is largely auto-configured using the makefiles
provided.  Two new -D entries are used in the mm and amoeba/kernel makefiles:

	-DAM_KERNEL	(used in mm and amoeba/kernel) enables networking
	-DNONET		(used in amoeba/kernel only) single machine networking,
			 in other words, local transactions only

If you use -DAM_KERNEL but not -DNONET, you get full networking and MUST have 
a Western Digital Etherplus card.  To install the makefiles and make other
necessary changes, run the install shell script (amoeba/install)

     If you add a new kernel task of your own then it MUST come between the
Amoeba kernel tasks and the printer task in the file kernel/table.c and should
be numbered relative to AMOEBA_CLASS in the file h/com.h (i.e. The task number
should be AMOEBA_CLASS+1 for the first new task, AMOEBA_CLASS+2 for the second
new task, etc.).  Be sure to set NR_TASKS correctly.

     To compile and install networking, you must follow the steps below 
carefully.


How to Install Amoeba
---------------------

You must do the following important steps carefully.


 1. Make sure that you are in the amoeba directory.
    Run the command:

	install

 2. If you do not have much free disk space then do the following:
    Go to the fs directory (ie. ../fs) and type
	make clean
    Then go to the mm directory (ie. ../mm) and type:
	make clean
    Then go to the kernel directory (ie. ../kernel) and type:
	make clean

 3. Go to the amoeba/mm directory and run make.

 4. Go to the amoeba/fs directory and run make.

 5. Go to the amoeba/kernel directory (NOT the regular kernel directory).
    If you do NOT have an ethernet card but still wish to have local Amoeba
    transactions then edit the makefile and add -DNONET to the CFLAGS.
    If you do have an ethernet card and would like to keep ethernet statistics
    then add -DSTATISTICS to CFLAGS.

 6. Now run make.

 7. Go to the tools directory and build a new boot floppy.
    The command to do this is:
	make net

 8. Reboot your machine using the new boot floppy.

 9. Test the system.  The directory amoeba/examples contains several programs 
    to test the reliability of transactions.  The READ_ME file in the directory
    gives more details.

10. If you have an ethernet card then install the network tools.  The directory
    amoeba/util contains utilities for remote shells, remote file copying and
    message sending.  These only work with machines that have Amoeba
    transactions installed.  The READ_ME file there gives more details.


8. NETWORKING UTILITIES 
     There are several utility programs which you may find useful if you have a
network connection.  They are listed below with a brief outline of their use.
Other utilities are possible and reasonably simple to write as shell scripts
that use rsh (remote shell, described below).  The utilities are located in the
amoeba/utilities directory.

8.1 Remote Shell
     One of the main features of MINIX networking is the use of the remote 
shell.  This utility is a server that accepts commands over the network from
clients and executes them.  The syntax of this command is:

	rsh [-bei] <port> <command> 

This program executes the command specified by <command> on the machine with a
sherver (described below) listening to the port <port>, which is an ASCII
string of up to 6 characters.  It is used to generate a unique port name for
the underlying transaction mechanism.

     Normally standard output and standard error from the command are written 
on standard output of the local process.  If the -e flag is specified then they
are kept separate. The -i flag specifies that standard input for the command 
should come from the local process.  The -b flag specifies that the rsh should
be started in the background.  Some examples:

	rsh bozo

starts an interactive shell on the machine running a sherver with port bozo.
Subsequent commands that you type will be fed to the remote shell.  You can
use cd to change to a directory on the remote machine, ls to list files in the
remote directory, and any other commands you want.  In effect, rsh gives you a
simple form of remote login.  Note that to make this work, the remote process
listening on the port bozo must be a shell server (sherver).

     As a second example of rsh, consider

	rsh jumbo cat /etc/passwd

which displays on your screen the file /etc/passwd from the machine running a 
sherver with port jumbo.  The rsh command could also have redirected this 
output to a local file or pipe.

     A slightly more complex example is

	rsh -i freddo 'cat >/usr/ast/junk' </etc/termcap

which runs the command

	cat >/usr/ast/junk

on machine the machine running a sherver with port freddo and takes as input 
the file /etc/termcap from the local machine.  Note that by quoting the
second argument, it is passed as a string to the remote sherver.  If the
command contains magic characters (e.g., *.c) the resulting action depends on
whether the command is quoted or not.  If it is not quoted, the local shell
will expand the magic characters before rsh is even called.  If the command
is quoted, the command string is passed unmodified to the remote sherver,
which then expands it in the directory it is currently working in.

     When you log into a remote machine with rsh, you get a shell having the
uid and gid of the sherver (see below).  To get your own uid and gid, type

	exec su george

assuming that your login is george.  If you have a password, su will ask for
it.  Needless to say, the su program will use /etc/passwd on the remote 
machine.  Do not forget to use exec, as this eliminates the need for an extra
shell.  If you do not need your own uid, don't bother, as it costs memory.

8.2 Shervers
     To enable remote shell operations, it is necessary to have a sherver
running on the destination machine.  Shervers can be started up by:

	sherver <port> 

assuming that sherver is kept in  /usr/bin.  This program listens to the
port specified and accepts a single request from the program rsh.  It then
executes it with the uid and gid of the sherver.  When it is finished, the
sherver exits.

     The sherver gets its input from a pipe.  This means that it can only do
those things possible with a pipe as input. In particular, signals (e.g., DEL),
EOF (e.g., CTRL-D), and the ioctl system call do not work properly.  Hitting
DEL remotely will kill the sherver.  There is no simple solution, except to
use stty to change your DEL character so that you do not hit it out of habit.

8.3 Masters
     Another useful program is master.  It is started up as follows:

	master <count> <uid> <gid> <command> 

This program starts up <count> copies of the program specified by <command>
with user id <uid> and group id <gid>.  The command may be given parameters.
If at any time the command exits or dies then master will start up a new
invocation of it.  This was designed to work with shervers but has other 
applications as well.  

For example,

	/usr/bin/master 1 2 2 /etc/sherver mumbo

will start a single sherver listening to the port `jumbo' and ensure that
there is always a sherver running.  This sherver will have uid=2 and gid=2,
so that rsh calls to mumbo will be executed with this uid/gid combination.  
It is suggested to start up master in the /etc/rc file of any machine
running shervers. When a sherver finishes executing a command, it exists.
By having master running in the background all the time, every time a sherver
exists, its parent, master, will create a new one.  This mechanism is somewhat
akin to init creating a new login process whenever a shell exits.  Since $PATH
is generally not set prior to executing /etc/rc, master should be specified as
/usr/bin/master (or whatever).


8.4 File Transfer
     The standard MINIX networking provides for file transfer using a shell
script called rcp (remote cp).  The syntax of the call is

	rcp [port!]from_file [port!]to_file 

It can also do local file copy but this is more easily accomplished with cp. 
Here are two examples of rcp usage:

	rcp jumbo!/etc/passwd .
	rcp jumbo!/etc/passwd freddo!/usr/ast/pebble

The first one will copy the file /etc/passwd from the machine running a sherver
with the port jumbo to the file passwd in the current directory. The second one
will copy the file /etc/passwd from the machine running a sherver with the port
jumbo to the file /usr/ast/pebble on the machine running a sherver with the 
port freddo.  Thus it is possible to issue commands on machine A to copy files
from machine B to machine C.

8.5 Remote Pipes
     It is possible to set up remote pipes using the programs 'to' and 'from'.
The program 'to' reads from standard input and writes its output to the named
port.  Similarly, 'from' reads from the named port and writes to standard
output.  For example, consider the following commands, possibly given on two 
different machines:

	cat F* | sort | to 'port66'
	from 'port66' | uniq -c | sort -n

The first command concatenates files beginning with 'F', sorts them, and writes
the output to 'port66'.  The second commands reads from 'port66' and provides
input to the rest of the pipeline.


9. THE ETHERNET INTERFACE
     The ethernet driver in this version of Minix is for the Western Digital 
Ethercard Plus card, which is also known as the WD1003E. The ethernet 
controller chip on this board is the National Semiconductor DP8390.
If you have a different type of ethernet controller then there are several
things you need to know about the interface between the driver and the Amoeba
transaction layer in order to write a suitable driver for your card.

     There were several fundamental assumptions made while designing the high 
level protocol which affect the ethernet driver.

  1. The ethernet controller has enough local memory to buffer at least one
     incoming packet and one outgoing packet and will not overwrite a buffer
     with a new incoming packet until the buffer has been released.

  2. Read buffers are released in the same order as they were allocated.
     After a read interrupt has occurred and (*bufread)() has been called,
     then bufread will not be called again until an eth_release has been done.

  3. The ethernet driver generates no write interrupts.  This is because we
     found that the chip on the ethernet card was so much faster than the CPU
     on the Zenith AT clone that it was always sent before the next packet is
     ready.  If necessary the high level code busy-waits until the ethernet
     write buffer is free.  If write interrupts are required then pkt_sent()
     in amoeba.c should be modified.
     This is rather disgusting, but was done for efficiency reasons.  Interrupt
     handling is expensive under Minix and so it was much faster to not
     generate interrupts and just wait for the buffer to become free.

There are several routines used by the high level code which should be
provided by the ethernet driver.  Unless otherwise stated, these routines are 
called in the file amoeba.c.

1. etheraddr	- get ethernet address of this host from rom.
2. eth_init	- initialises the ethernet card and sets pointers to routines
		  to be called on packet arrival and departure.
3. eth_getbuf	- returns pointer to next write buffer.
4. eth_write	- writes the current "write buffer" to the net.
5. eth_release	- release a read buffer for reuse.
6. eth_stp	- shuts up the ethernet chip so that reboot can stop all
		  interrupts from the chip.  The normal reboot procedure
		  doesn't stop the WD1003E from running, so the next time
		  interrupts are enabled it makes a fuss (called from klib88.s).

The files dp8390.c, dp8390.h, dp8390info.h and dp8390stat.h contain routines
specific to the NS DP8390 chip.  These may need some slight changes before
working correctly with another manufacturer's board which also uses this chip.
The files etherplus.c and etherplus.h contain routines specific to the WD1003E
board.

10. REFERENCES

 1. Birrell, A.D., and Nelson, B.J.: "Implementing Remote Procedure Calls,"
    ACM Transactions on Computer Systems, vol. 2, pp. 39-59, Feb. 1984.

 2. Cheriton, D.. "The V Kernel: A Software Base for Distributed Systems,"
    IEEE Software Magazine, vol. 1, pp. 19-42, April 1984.

 3. Bal, H.E., Renesse, R. van, and Tanenbaum, A.S.: "Implementing Distributed 
    Algorithms using Remote Procedure Call," Proc. National Computer Conference
    AFIPS, pp. 499-505, 1987.

 4. Renesse, R. van, Tanenbaum, A.S., Staveren, H., and Hall, J.: "Connecting
    RPC-Based Distributed Systems using Wide-Area Networks," Proc. Seventh
    International Conf. on Distr. Computer Systems, IEEE, pp. 28-34, 1987.

 5. Tanenbaum, A.S., Mullender, S.J., and van Renesse, R.: "Using Sparse 
    Capabilities in a Distributed Operating System," Proc. Sixth 
    International Conf. on Distr. Computer Systems, IEEE, 1986. 

 6. Mullender, S.J., and Tanenbaum, A.S.: "The Design of a Capability-Based 
    Distributed Operating System," Computer Journal, vol. 29, pp. 289-299, 
    Aug. 1986.

 7. Tanenbaum, A.S., and Renesse, R. van: "Distributed Operating Systems," 
    Computing Surveys, vol. 17, pp. 419-470, Dec. 1985.

 8. Mullender, S.J., and Tanenbaum, A.S.: "A Distributed File Service Based on 
    Optimistic Concurrency Control," Proc. Tenth Symp. Oper. Syst. Prin., 
    pp. 51-62, 1985.

 9. Mullender, S.J., and Tanenbaum, A.S.: "Protection and Resource Control in 
    Distributed Operating Systems," Computer Networks, vol. 8, pp. 421-432, 
    Oct. 1984.

10. Tanenbaum, A.S., "Computer Networks, 2nd ed., Englewood Cliffs, NJ: 
    Prentice-Hall, 1989.

bae@ati.tis.llnl.gov (Hwa Jin Bae) (07/17/88)

In article <860@ast.cs.vu.nl> ast@cs.vu.nl (Andy Tanenbaum) writes:
>
>                        MINIX NETWORKING

This is great stuff!  For a lot of us, Western Digital's WD8003E is just
about the only ethernet card we can afford to buy at about $200.  It
also happens to be of relatively higher performance than other card that
cost more $$$.  RPC based networking also sounds great.  Will we have
to pay any extra bucks for all this?

>10. Tanenbaum, A.S., "Computer Networks, 2nd ed., Englewood Cliffs, NJ: 
>    Prentice-Hall, 1989.

1989?   Hmm...  This is a let-down.  My friends and I here have been
waiting for this new edition to come out for some time now since the last
brief mention about the completion of the rewriting of this book by AST
in this news group.  Can you enlighten us as to when it will actually
be available in U.S.?
Hwa Jin Bae          | Standard excuses...not responsible.../dev/null...etc.
Control Data Corp.   | (415) 463 - 6865
4234 Hacienda Drive  | bae@tis.llnl.gov			   (Internet)
Pleasanton, CA 94566 | {ames,ihnp4,lll-crg}!lll-tis!bae    (UUCP)