[comp.protocols.tcp-ip] Implementing TCP/IP outside of UNIX kernel?

balenson@TIS.COM (David M. Balenson) (03/30/89)

Is it possible to implement TCP/IP in a UNIX (Berkeley 4.{2,3}, SunOS{3,4})
system OUTSIDE of the kernel?  I presume doing so would have a major impact
on efficiency, but it might be much easier to program.  Does anyone know o
any such TCP/IP implementations?  Thanks.

-David M. Balenson
 Trusted Information Systems
 (301) 854-5358

crocker@LA.TIS.COM (03/31/89)

David,

A general (and reasonably obvious) comment on your question: At the
very least, you need access to a network driver to get the data in and
out of your machine.  The connection management stuff can be
implemented as user code, but, as you suggested, may be
extraordinarily inefficient.  (Vint Cerf implemented his first version
of TCP (or perhaps NCP) this way, using one process per connection,
and the result was "of academic interest only.")

A separate but related problem is how the ordinary user processes are
going to communicate with the TCP/IP implementation.  If TCP/IP is not
available as a kernel service, then you have to connect to it via the
standard interprocess communication mechanism.  I'm not familiar with
the current IPC facilities in Unix, but somehow you'd have to set up
your TCP/IP implementation as a server.

And to answer your direct question, I'm afraid I don't know of an
implementation like this.

Zeke

dyer@spdcc.COM (Steve Dyer) (04/01/89)

In article <8903301431.AA19142@TIS.COM> balenson@TIS.COM (David M. Balenson) writes:
>Is it possible to implement TCP/IP in a UNIX (Berkeley 4.{2,3}, SunOS{3,4})
>system OUTSIDE of the kernel?  I presume doing so would have a major impact
>on efficiency, but it might be much easier to program.  Does anyone know o
>any such TCP/IP implementations?  Thanks.

Phil Karn's KA9Q TCP/IP implementation can run as a user process
under most flavors of UNIX, although I wager that only those folks
without a TCP/IP implementation on a machine would bother.  I got it
running under XENIX 386 with SLIP last year, although I never really beat
on it hard.  At the time it was a monolithic program and didn't provide
a socket application library, so it was mainly of tutorial interest to me.
I haven't examined it lately.

It is available for non-commercial use via anonymous FTP on bellcore.com
in the pub/ka9q directory.

smb@ulysses.homer.nj.att.com (Steven M. Bellovin) (04/01/89)

A number of years ago, 3Com marketed a product called UNET, an implementation
of TCP/IP that had IP in the kernel, but TCP in user space on UNIX systems.
It ran on V7, 4.1bsd, and derivatives; I (among others, I think) ported it
to System V of various flavors.  Among the debts the Internet community
owes to this implementation is SLIP; Rick Adams' original SLIP was
designed to be compatible with UNET's equivalent.

pcg@aber-cs.UUCP (Piercarlo Grandi) (04/04/89)

In article <8903301431.AA19142@TIS.COM> balenson@TIS.COM (David M.
Balenson) writes:

    Is it possible to implement TCP/IP in a UNIX (Berkeley 4.{2,3},
    SunOS{3,4}) system OUTSIDE of the kernel?

If you want to create a TCP/IP server, you have two problems; how to
make it communicate with the network interface, and how with its
clients.  The first is fairly easy (euphemism). The second is easy, but
requires use of UNIX domain connections.

The server has a UNIX domain port on which it listens for requests to
open TCP/IP connections, etc... On receiving such a request, it creates
a new UNIX domain socket pair, and passes one end to the requestor
using the UNXI domain fd-in-a-message passing facility. It keeps a
table of TCP/IP addresses vs. UNXI domain connections opened.

Whenever the client writes something on its UNIX domain connection to
the server, the table is used in one way, and all data read from that
fd is sent off to the network interface using the associated TCP/IP
address; when data arrives from the network interface, the table is
used to map the TCP/IP address to the appropriate fd.

    I presume doing so would have a major impact on efficiency,

Yes and No. Of course a TCP/IP server (of for that matter, any protocol
server written in that way) means that on every message send you have
two task switches extra and daya copies, but for that there is no other
overhead. If TCP/IP is infrequently used, the server gets paged out, so
more real memory is available.

Overall I'd expect lower performance, but on an OS designed for this
style of implementation this need not be true.

    but it might be much easier to program.

Oh yes, of course. And, most importantly, you can have multiple
servers, and change them dynamically etc...

    Does anyone know o any such TCP/IP implementations?  Thanks.

The key to this whole business is to have the ability to pass fds in
messages from one process to another in the UNIX domain.

This facility has been copied from the excellently designed (by Richard
Rashid) Accent IPC facility, that can be now found in Mach. Also,
Edition 8/System V STREAMS have this ability, precisely for this
reason. Amoeba has an equivalent facility, even if I don't like
the way it is designed.

As far as I know Accent (or even Mach) network IPC is implemented
(optionally) using servers.

Note also that the unimplemented (and exceptionally difficult to
understand) user implemented domains facility of 42BSD (and in some way
wrappers) were "designed" (hackerly -- after all one of the principal
architects was Dr. Joy :-]) to enable this.

Overall, I'd suggest you look at MACH. It was designed (well -- after all
one of the principal architects was Dr. Rashid) around this idea.

You can even look at these fds that get passed around in messages
as capabilities, and build a fully distributed (and secure) capability
based system using this style.
-- 
Piercarlo "Peter" Grandi            |  ARPA: pcg%cs.aber.ac.uk@nss.cs.ucl.ac.uk
Dept of CS, UCW Aberystwyth         |  UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK  |  INET: pcg@cs.aber.ac.uk

ml@gandalf.UUCP (Marcus Leech) (04/07/89)

In article <2913@spdcc.SPDCC.COM>, dyer@spdcc.COM (Steve Dyer) writes:
> 
> Phil Karn's KA9Q TCP/IP implementation can run as a user process
> under most flavors of UNIX, although I wager that only those folks
> without a TCP/IP implementation on a machine would bother.  I got it
> running under XENIX 386 with SLIP last year, although I never really beat
> on it hard.  At the time it was a monolithic program and didn't provide
> a socket application library, so it was mainly of tutorial interest to me.
> I haven't examined it lately.
> 
> It is available for non-commercial use via anonymous FTP on bellcore.com
> in the pub/ka9q directory.

Indeed.  I spent some time tweaking the KA9Q TCP/IP code to provide a BSD
  socket interface to applications.  The general notion was that TCP/UDP/IP
  and the "device handlers" would run in a single process, with a socket
  library to provide an interface to applications via named pipes. There's a
  gotcha with sockets, however.  Sockets are objects upon which it is legal
  to do read () and write () calls (a socket looks like a file descriptor that
  is both readable and writeable).  Faking that up using (for example) named
  pipes turned out to be more-or-less impossible.  The BSD socket library also
  provides sock_send () and sock_recv () (the names may be wrong--this is
  from memory).  Sock_send () and Sock_recv () could certainly be made to
  work with a named-pipe/FIFO implementation--provided you don't want read()
  and write () functionality also.
My conclusion is that it is better to place at least some of the networking
  code in the kernel.  My next crack at the KA9Q code will involve burying
  much of it in a "socket" device driver.  The device driver approach lends
  to easier integration into binary-only systems.  Most binary-only
  distributions don't allow the addition of system calls (and many other
  things).

karn@jupiter (Phil R. Karn) (04/13/89)

>Indeed.  I spent some time tweaking the KA9Q TCP/IP code to provide a BSD
>  socket interface to applications.  [...]  There's a
>  gotcha with sockets, however.  Sockets are objects upon which it is legal
>  to do read () and write () calls (a socket looks like a file descriptor that
>  is both readable and writeable).

I recently rewrote my package to use a very simple non-pre-emptive
multitasking kernel. It provides what I believe are generally known as
"lightweight processes" -- tasks share external and static variables,
but they have private stacks and therefore private automatic variables.
Tasks are switched with the C-library setjmp/longjmp functions --
simple, effective and about as portable as such code can be. As before,
heavy use is made of the storage allocator to create per-process
data structures that must be maintained between calls to a function or
shared across multiple functions.

This made it possible to implement a socket layer as a "veneer" on top
of the older upcall mechanism, and to rewrite all of the applications in
terms of the new socket interface. (I was amazed at how much simpler the
FTP client became!)

Pretty much the whole set of Berkeley socket calls are provided with
similar semantics, albeit with the caveat that I didn't test Berkeley's
code to see exactly how it behaved in all sorts of strange exception
conditions. In doing this job (now the so-called "NOS" version) I ran
smack into the problem you report. I would have very much liked to use a
common file descriptor space for both sockets and regular file
descriptors, but this just didn't seem possible to do in a portable
manner. So the socket descriptor space is distinct, and you can't do
read, write or close calls on socket descriptors (a separate call,
close_s, is provided for closing sockets).

A snapshot of my current development work can be found on
flash.bellcore.com under /pub/ka9q/src.arc. An executable is in
/pub/ka9q/net.exe.  Please don't pepper me with a lot of questions, it's
still being polished and I haven't had time to write the documentation
yet.

Phil

huitema@mirsa.inria.fr (Christian Huitema) (04/13/89)

From article <15285@bellcore.bellcore.com>, by karn@jupiter (Phil R. Karn):
> ........... In doing this job (now the so-called "NOS" version) I ran
> smack into the problem you report. I would have very much liked to use a
> common file descriptor space for both sockets and regular file
> descriptors, but this just didn't seem possible to do in a portable
> manner....

The same problem was encountered with a lot of the early implementations of 
the OSI transport. A number of them were done outside the kernel, either 
because the programmers had not access to the kernel at all, or because they
wanted to minimize the development of kernel code, or because they barked at
increasing the kernel space by umpteen kbytes for just a seldom used
protocol. Most implementations came with a single process that would
interface the network (X.25 or Ether, in most cases), manage the protocol
engine and provide some form of interface with the user processes. Some used
named pipes, some others used their own form of ``communication drivers'',
whose design was not unlike that of pseudo ttys. All ran into a number of
problems:

1- Compatibility with the socket library is extremely hard to achieve. In
particular,  sockets can be dynamically created by the ``socket'' and
``accept'' call; this feature is embedded in most of the Berkeley
applications, and you have to devise work arounds. 

2- Performance is poor, as each packet will wake up the transport process
first, then the application process. Switching from one task to another is
by no way what Unix does best. Moreover, there could be nasty side effects,
like the transport process using a lot of CPU, thus getting lower and lower
priority in the Unix scheduler.

There is a way to circumvent the secund point, which is to provide a
subroutine implementation of the transport, for architectures where a single
user process serves multiple connexions. The key is that the application
must be the sole user of a particular network address, and that it should be
ready to multiplex its various connexion contexts, e.g. using some form of
light weight process. It should also call the transport manager at
reasonable intervals, so that the protocol machine would not break.

This architecture is commonly used for information servers, which need to
handle a very large number of connexions (several hundreds), and would be
penalized by the standard UNIX architecture: either the number of connexions
would be limited by the maximum number of file descriptor per task, or by
the maximum number of FD per system, or the number of tasks would become so
large that the system would be inefficient, and the interprocess
synchronisation a nightmare. Actually, it could be a very reasonable design
for an X-Window terminal.

Christian Huitema

nick@spider.co.uk (Nick Felisiak) (04/25/89)

In-Reply-To: Message from "David M. Balenson" of Mar 30, 89 at 9:31 am

> 
> 
> Is it possible to implement TCP/IP in a UNIX (Berkeley 4.{2,3}, SunOS{3,4})
> system OUTSIDE of the kernel?  I presume doing so would have a major impact
> on efficiency, but it might be much easier to program.  Does anyone know o
> any such TCP/IP implementations?  Thanks.
> 
> -David M. Balenson
>  Trusted Information Systems
>  (301) 854-5358
> 
> 


David,

At Spider we have written a "clean" version of the AT&T Streams package
(i.e no AT&T code but similar functionality), mainly as an aid to debugging
our streams software (TCP, X.25, ISO).  Using this, I run the whole "TCP"
stack (ip, arp, etc) in one process, and communicate from applications using
either UNIX IPC (msgget, msgsnd, etc), named pipes, or 'real' streams.

A standard raw ethernet driver is used to communicate with the outside world.

Performance is really not as bad as you might fear; measurements indicate
around 50% of the in-kernel system.

Makes portability really good, though!

Nick Felisiak				nick@spider.co.uk
Spider Systems, Edinburgh

+ 44 31 554 9424