[net.unix] Asynchronous I/O on UNIX?

glenn@ll-xn.ARPA (Glenn Adams) (11/21/85)

> 
> As more and more mainframe systems are moving to UNIX, I am
> very interested in finding out how asynch I/O is being implemented
> on these systems.
> 

This was one of the first complaints I had about UNIX after having used
operating systems which imposed fewer constraints on how user processes
performed synchronization on I/O completion.  There are a few things
worth contemplating here.  Firstly, why there is no asynchronous I/O
mechanism in UNIX, and secondly, how may such a mechanism be implemented.

Before getting down to details, it should be pointed out that there
are other, conceptually cleaner methods for performing overlapped
I/O.  That is, using multiple processes each of which have no more
than one outstanding (blocking) I/O request.  This form of overlapped
I/O results in a conceptually straightforward implementation but is
costly in terms of efficiency.  This is especially true due to the
hard boundaries maintained between process address spaces and the lack
of a shared memory mechanism (non SYSV).  In addition, the overhead from context
switching contributes to an overall inefficiency.  Moreover, the current
mechanisms in UNIX for interprocessor communication, e.g., pipes, sockets,
or files, all result in the copying of data to and from the kernel address
space as it is being transferred to the destination process.  This introduces
more inefficiences.

There is, however, the select() system call in 4.[23]BSD UNIX which allows
a timed blocking poll of multiple potentially outstanding I/O activities.
This is many times more efficient than the previous busy wait polling method
which used the FIONREAD ioctl(), and this latter method was usable on a
limited number of I/O activities, e.g., read().  Given these various
mechanisms for performing multiple I/O activities, most applications have
chosen to make do with them rather than address the more difficult task
of implementing a more general kernel-based asynchronous I/O mechanism.

My efforts originated while implementing an I/O intensive signal processing
application under RSX-11M/S using Whitesmith's C.  My first job was to
throw out the junk Whitesmith's called a standard I/O library and make it
more similar to UNIX V7.  I actually used the 4.2BSD stdio with the addition
of most V7 system calls which were mapped to RSX Executive Calls of one
sort or another.  Since, for this application, I had strong need of the
efficiency of asynchronous I/O, I needed some UNIX like mechanism for
implementing it.  What I ended up with is as follows:  prior to performing
an I/O operation, e.g., read(), write(), or ioctl(), an fcntl() call is
performed with a command argument of F_ASYNC, and an argument which points
to an Asynchronous Control Block.  This argument structure contains the
address of the asynchronous I/O handler and an optional argument to be
passed to the handler.  The optional argument is used to communicate application
specific information to the handler about the subsequent I/O activity.
The handler is invoked upon I/O completion as follows:
	(*handler)(status, opt-argument);
Thus the status code indicating the success/failure of the I/O activity is
communicated along with the optional arugment specified in the Asynchronous
Control Block.

It may be argued that a cleaner mechanism could be implemented, especially
since this calls for two stages, i.e., arming and execution phases.  However,
I felt that it was better to do it this way than to add another argument to
all I/O related system calls, or even worse, to add yet more system calls.
>From the application programmer's perspective, this mechanism is quite simple to
use and builds upon existing system calls.  The semantics of handler invocation
are quite simple and result in a clean interface with minimal global data
communication.  Furthermore, since this mechanism enables asynchronous
notification on a per descriptor basis, it is possible to have outstanding
I/O on multiple descriptors.  Further still, since an optional argument is
specified on a per I/O request basis, i.e., the optional argument in the
Asynchronous Control Block, it is possible to have multiple outstanding I/O
requests on a single descriptor and use this optional argument to identify
the request.

For the application for which I implemented this mechanism, it was necessary
to have overlapping I/O on multiple devices and to have multiple outstanding
requests enqueued to a single device.  The latter was necessary to reduce I/O
turnaround latency on devices with very small data overrun periods, e.g.,
an unbuffered A/D converter.  I haven't mentioned a few details here such
as the obvious need for blocking out sections of critical code from incurring
asynchronous entry.

Now that I have had some success with this particular interface mechanism
for performing Asynchronous I/O, I am consdering how it might best be
implemented in the 4.[23]BSD environment.  I haven't scoped the problem
enough at this point to be able to state how difficult this will be.  One
problem that I see already is the fact that different device drivers use
different mechanisms to perform synchronization.  Some use iowait(), and others
call sleep() directly.  If a single mechanism were used, e.g.,
iowait(), then the task would be much easier.  Those drivers that use
iowait() could now be easily converted to enable asynchronous notification
since a hook could be placed in iowait() to allow the process to continue and
then to notify the user process when the driver calls iodone().  However,
the other drivers would be much more difficult since they don't necessarily
follow this strict protocol, i.e., calling iowait() and then iodone().
The actual notification could come via the psignal() mechanism with a special
signal (SIGIO ?) being used to get things going.

I'm not sure when or if I will get an opportunity to try implementing these
ideas in the UNIX kernel; however, I thought it might be interesting to discuss
the ideas that I've had on the subject in case you or others are interested
in actually doing the implementation.
-- 

Glenn Adams
MIT Lincoln Laboratory

ARPA: 	glenn@LL-XN.ARPA
CSNET:	glenn%ll-xn.arpa@csnet-relay
UUCP:	...!seismo!ll-xn!glenn
	...!ihnp4!houem!ll-xn!glenn

paul@unisoft.UUCP (n) (11/30/85)

<oog>

	I too would like to see some form of async I/O under Unix, after
all VMS and even RT11 and the macintosh!! have it.

	The main reason I see against it is that it would massively complicate
unix's very simple user interface (read, when you are done reading do something
else ...). On the other hand you could avoid the problems with some utilities
that have to use more than 1 process to do some very simple things (cu for
example runs 2 processes, one each way, typing a character results in 2
process switches!!!!).

	The major problem in implementation under Unix is the concept of IO
context, with Unix an IO's context (u_base, u_count, u_offset) are in a
processes 'udot' if you are going to be doing multiple IO's you need to be
able to send this information to the device with the queued IO (you also have
to send the process ID, the completion routine's virtual address and the
parameter you are going to give to the completion routine). There is also
and amazing amount of internal synchonisation work needs to be done in the
kernel. And while you are doing that you may as well make it orthoganol
so that all the system calls can have completion routines.

	Rats!! It seemed so easy!! Still it would be nice to see it done ....
sigh ....


		Paul Campbell
		..!ucbvax!unisoft!paul

jsdy@hadron.UUCP (Joseph S. D. Yao) (12/01/85)

I'm afraid this isn't a great response, but:  I've seen one or two
implementations of asynchronous I/O.  One was done by Steve Holmgren
et al at U Ill Champaign-Urbana (U-C?) maybe half a dozen years ago.
Obviously, not based on any current system, nor has it been propagated
much, especially since it was part of a version that completely changed
UNIX.

I vaguely remember an interface; and I don't remember whether this
is from Steve's implementation or not.  Basically, one opens an fd
and posts a signal routine for that fd.  Asread()'s and aswrite()'s
etc. return an object which is compared to (something, I think
retrieved by another system call) at interrupt time.  Not too far
from simple.
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

wls@astrovax.UUCP (William L. Sebok) (12/03/85)

In article <604@unisoft.UUCP> paul@unisoft.UUCP (n) writes:
>	The major problem in implementation under Unix is the concept of IO
>context, with Unix an IO's context (u_base, u_count, u_offset) are in a
>processes 'udot' if you are going to be doing multiple IO's you need to be
>able to send this information to the device with the queued IO (you also have
>to send the process ID, the completion routine's virtual address and the
>parameter you are going to give to the completion routine).

I noticed that in the transition from the 4.1 BSD to the 4.2 BSD kernel,
references in the i/o drivers to u.u_base, u.u_count, u.u_offset are replaced
by references to members of a uio structure.  This was the only change (that I
remember) that I had to make in some locally written drivers I ported.  I
wondered and still wonder if this change means that support of asynchronous
i/o is in the works in BSD.
-- 
Bill Sebok			Princeton University, Astrophysics
{allegra,akgua,cbosgd,decvax,ihnp4,noao,philabs,princeton,vax135}!astrovax!wls