[comp.lang.misc] UNIX semantics do permit full support for asynchronous I/O

steve@nuchat.UUCP (Steve Nuchia) (08/29/90)

On the subject of asynchronous I/O in Unix:  I've come up with
what I consider a rather slick way of making it fit neatly
into Unix's way of doing things:

Have read(2) and write(2) calls map the pages containing the buffers
out of the user address space and return immediately.  Once the
data have been copied (DMAed?) to/from the buffers, map the pages back in.

A user program that is not aware of the subterfuge will then run
along for some (probably short) time and trap on an attempt to
refill or inspect the buffer.  It will then be blocked until
the request completes.  A savvy program will do something else
for as long as it can, then take a peek at the buffer when it
has run out of busy work.  One would probably also provide
(grudgingly, in my case) an explicit call for discovering the status.

(Note that such a call might be useful to a program that
wished to control its paging behaviour if it were written with
sufficient generality.)

The scheme will only provide asynchronicity in cases where the
return value for the read or write call is known early.  This
will be the case primarily for files, but other cases can be
made to take advantage of it.

The performance characteristics would be similar to using mmap
but would apply to programs written in normal unix style and
to dusty decks.  One must of course take some care in implementing
the scheme, and there are no doubt the usual raft of gotchas that
come up when doing anything involving memory management.  The
case of write(1,buf,read(0,buf,sizeof(buf))) is entertaining to
contemplate for instance.

Good performance with V7 file system call semantics.  Programs
work whether the feature is in the kernel or not and whether they
are written to take advantage of it or not.  I sure wish I
had thought of it a long time ago (like while the standards
were still soft).

I would appreciate any comments that wizards with more kernel
internals experience might have.  If I've rediscovered something
well-known again I think I shall slit my wrists.

(For completeness I will note that to use this scheme intelligently
you must be able to discover the relevant properties of the memory
management implementation.  This is nothing new for high performance
programs in a paged environment, but unless its been added recently
there isn't a standard way to do it.  Whether this is properly a
language or a system interface issue is best left to another debate.)
-- 
Steve Nuchia	      South Coast Computing Services      (713) 964-2462
"To learn which questions are unanswerable, and _not_to_answer_them;
this skill is most needful in times of stress and darkness."
		Ursula LeGuin, _The_Left_Hand_of_Darkness_

lewine@dg-rtp.dg.com (Donald Lewine) (08/29/90)

In article <27619@nuchat.UUCP>, steve@nuchat.UUCP (Steve Nuchia) writes:
|> (For completeness I will note that to use this scheme intelligently
|> you must be able to discover the relevant properties of the memory
|> management implementation.  This is nothing new for high performance
|> programs in a paged environment, but unless its been added recently
|> there isn't a standard way to do it.  Whether this is properly a
|> language or a system interface issue is best left to another debate.)
|> -- 

That last remark defeats your entire suggestion.  If I have to
"discover the relevant properties of the memory management 
implementation", all dusty decks will fail.  

If you blindly map the page(s) containing "buf" out of the users
address space you will map out other variables that the user may
want.  It is not possible for the compiler to know that buf must
be on a page by itself.  How could you implement your scheme?

Also the read() and write() functions return the number of characters
read or written.  How do you know this before the read() or write()
completes?  Do you assume that all disks are error free and never
fail?  That is a poor assumption!

I don't think that your idea works at all.  For a scheme that does
almost exactly this, but with the cooperation of the user program,
look at PMAP under DEC's TOPS-20 operating system or ?SPAGE under
DG's AOS/VS.

--------------------------------------------------------------------
Donald A. Lewine                (508) 870-9008
Data General Corporation        (508) 366-0750 FAX
4400 Computer Drive. MS D112A
Westboro, MA 01580  U.S.A.

uucp: uunet!dg!lewine   Internet: lewine@cheshirecat.webo.dg.com

rsc@merit.edu (Richard Conto) (08/30/90)

In article <27619@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
>On the subject of asynchronous I/O in Unix:  I've come up with
>what I consider a rather slick way of making it fit neatly
>into Unix's way of doing things:
>
>Have read(2) and write(2) calls map the pages containing the buffers
>out of the user address space and return immediately.  Once the
>data have been copied (DMAed?) to/from the buffers, map the pages back in.
>
>A user program that is not aware of the subterfuge will then run
>along for some (probably short) time and trap on an attempt to
>refill or inspect the buffer.  It will then be blocked until
>the request completes.  A savvy program will do something else
>for as long as it can, then take a peek at the buffer when it
>has run out of busy work.  One would probably also provide
>(grudgingly, in my case) an explicit call for discovering the status.

A buffer is not necessarily aligned on a page boundary. And a page
may contain more than one variable.  The savvy program would have to
design it's data structures (including local variable arrangement, if
a buffer happens to be there) to be aware of whatever peculiar way
the complier lays out variables and whatever peculiar granularity the
OS has for pages.

Make it simpler. Have a routine that requests an I/O operation. Another
routine that can check it's status. A way of specifying a routine to be
called when the I/O operation completes might be yet another option.
I'm afraid that your idea adds unnecessary complexity (and system dependancies).
And using constructs like 'write(fdout,buf,read(fdin,sizeof(buf), buf))' is
asking for trouble when 'read()' returns an error condition.

--- Richard

jlg@lanl.gov (Jim Giles) (08/30/90)

From article <27619@nuchat.UUCP>, by steve@nuchat.UUCP (Steve Nuchia):
> On the subject of asynchronous I/O in Unix:  I've come up with
> what I consider a rather slick way of making it fit neatly
> into Unix's way of doing things:
> 
> Have read(2) and write(2) calls map the pages containing the buffers
> out of the user address space and return immediately.  Once the
> data have been copied (DMAed?) to/from the buffers, map the pages back in.
> [...]

Yes, this will work.  I believe that MACH already does this.
Unfortunately, this idea has two problems: 1) not all machines are
paged/segmented; 2) not all I/O requests are a multiple of the
pagesize.  The first problem is more severe - hardware designers avoid
pages/segments when designing for speed.  The extra hardware overhead
is 10% speed or about that for extra hardware cost.  So they are
avoided (Crays don't have pages or segments).  The pagesize problem
just means that you'd have to map out more memory than is actually
involved in the I/O request.  This means that the user might get
blocked on memory that is really perfectly safe to access - a minor
source of slowdown.

J. Giles

merriman@ccavax.camb.com (08/30/90)

In article <1990Aug29.170931.10853@terminator.cc.umich.edu>, 
	rsc@merit.edu (Richard Conto) writes:

> 
> Make it simpler. Have a routine that requests an I/O operation. Another
> routine that can check it's status. A way of specifying a routine to be
> called when the I/O operation completes might be yet another option.

Sure sounds like VMS QIO calls.

bdsz@cbnewsl.att.com (bruce.d.szablak) (08/30/90)

In article <1990Aug29.170931.10853@terminator.cc.umich.edu>, rsc@merit.edu (Richard Conto) writes:
> In article <27619@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
> >Have read(2) and write(2) calls map the pages containing the buffers
> >out of the user address space and return immediately.
>  
> A buffer is not necessarily aligned on a page boundary. And a page
> may contain more than one variable.

Actually, the OS only has to mark the pages as copy on write. This sort
of thing is often done when a process forks to avoid making a copy of the data
space for the child. Whether its worth it is another matter.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (08/31/90)

In article <861@dg.dg.com> uunet!dg!lewine writes:
> In article <27619@nuchat.UUCP>, steve@nuchat.UUCP (Steve Nuchia) writes:
> > (For completeness I will note that to use this scheme intelligently
> > you must be able to discover the relevant properties of the memory
> > management implementation.
> That last remark defeats your entire suggestion.  If I have to
> "discover the relevant properties of the memory management 
> implementation", all dusty decks will fail.  

No. Steve's point was that on paged architectures, he can get a low-cost
speedup out of some programs without any change in semantics. This is a
worthwhile change.

Discovering memory managemment can be as simple as having a system call
getpage() that returns a char buffer taking up exactly one page. Any
code that understands this can take full advantage of asynchronous I/O.

> If you blindly map the page(s) containing "buf" out of the users
> address space you will map out other variables that the user may
> want.  It is not possible for the compiler to know that buf must
> be on a page by itself.  How could you implement your scheme?

So what if there are other variables on the page? The worst that happens
is that the page gets mapped out and then back in; on paged hardware,
this cost is negligible. The best that happens is that the program uses
getpage() and guarantees that it will wait for the I/O to finish on that
page.

> Also the read() and write() functions return the number of characters
> read or written.  How do you know this before the read() or write()
> completes?  Do you assume that all disks are error free and never
> fail?  That is a poor assumption!

So what? read() and write() already return before data gets written out
to disk; assuming that you see all I/O errors before a sync is the poor
assumption! This is irrelevant to the issue at hand.

---Dan

stripes@eng.umd.edu (Joshua Osborne) (08/31/90)

In article <61535@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <27619@nuchat.UUCP>, by steve@nuchat.UUCP (Steve Nuchia):
[...]
>> Have read(2) and write(2) calls map the pages containing the buffers
>> out of the user address space and return immediately.  Once the
>> data have been copied (DMAed?) to/from the buffers, map the pages back in.
>> [...]
>
>Yes, this will work.  I believe that MACH already does this.
>Unfortunately, this idea has two problems: 1) [omited]
>2) not all I/O requests are a multiple of the pagesize.
>The pagesize problem
>just means that you'd have to map out more memory than is actually
>involved in the I/O request.  This means that the user might get
>blocked on memory that is really perfectly safe to access - a minor
>source of slowdown.

It shouldn't be a source of slowdown in the read case; normally the program
would not get control untill after the read was done, the new worst case is
exactly the same.  However for reads the old best case is far better then
the new worst case, and it could happen relitavly offen.

The old best case: do a large write, the kernel copys the data lets you run
and writes it some time, you re-use the buffer.

If you do that under the new system:  You do the large write the kernal maps
out the pages,  gives you control you re-use the buffer, the kernel makes you
sleep untill it can do the write.  You lose out.  Alot of programs do this.
Currently stdio does this.  Of corse stdio would need a bit of tweeking anyway.
(allign page sized buffers on page boundrys)  While we are in there we could
make writes use 2 page buffers, and flush alterniate ones...  and do huge
writes directly out of the users space.

(just things to think about, I do like the idea...)
-- 
           stripes@eng.umd.edu          "Security for Unix is like
      Josh_Osborne@Real_World,The          Mutitasking for MS-DOS"
      "The dyslexic porgramer"                  - Kevin Lockwood
"Isn't that a shell script?"                                    - D. MacKenzie
"Yeah, kinda sticks out like a sore thumb in the middle of a kernel" - K. Lidl

peter@ficc.ferranti.com (Peter da Silva) (08/31/90)

In article <861@dg.dg.com> uunet!dg!lewine writes:
> That last remark defeats your entire suggestion.  If I have to
> "discover the relevant properties of the memory management 
> implementation", all dusty decks will fail.  

You only have to do this if you want to get some additional
performance. It should still work regardless.

> Also the read() and write() functions return the number of characters
> read or written.  How do you know this before the read() or write()
> completes?  Do you assume that all disks are error free and never
> fail?  That is a poor assumption!

Well, it's an assumption made for write() anyway. For read() you can just
treat it like any disk error on any other faulted-in page and blow off the
process. Disk errors are a very rare occurrence, and almost always require
human intervention anyway. Any other return value for read is known
ahead of time.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

lfd@cbnewsm.att.com (leland.f.derbenwick) (08/31/90)

In article <27619@nuchat.UUCP>, steve@nuchat.UUCP (Steve Nuchia) writes:
> On the subject of asynchronous I/O in Unix:  I've come up with
> what I consider a rather slick way of making it fit neatly
> into Unix's way of doing things:
> 
> Have read(2) and write(2) calls map the pages containing the buffers
> out of the user address space and return immediately.  Once the
> data have been copied (DMAed?) to/from the buffers, map the pages back in.
> 
> A user program that is not aware of the subterfuge will then run
> along for some (probably short) time and trap on an attempt to
> refill or inspect the buffer.  It will then be blocked until
> the request completes.  A savvy program will do something else
> for as long as it can, then take a peek at the buffer when it
> has run out of busy work.  One would probably also provide
> (grudgingly, in my case) an explicit call for discovering the status.

Apart from the implementation problems that others have mentioned,
_this suggestion breaks existing code_.

In essentially any serious database application, a completed
write() to a raw disk is treated as a guarantee that the data
block has been _physically written to the device_.  (This is
needed to ensure reliable transaction behavior in the presence
of potential system crashes.)  Since your suggestion would void
that guarantee, it is not benign.

On the other hand, I like your idea of implementing asynchronous
behavior using the ordinary read() and write() calls.  So how
difficult would it be to add a couple ioctl's to the existing
raw disk driver to support that?

One ioctl would select sync/async reads/writes (the default would
be the present behavior: sync read, sync write).  The other ioctl
would do the status inquiry.  With these, asynchronous behavior
is available on demand, and the OS doesn't need to jump through
any hoops to make it transparent: it's up to the user to use the
facility properly.

This is a lot cleaner than implementing asynchronous I/O in user
mode with shared memory and a background process...

 -- Speaking strictly for myself,
 --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
 --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

brians@hpcljms.HP.COM (Brian Sullivan) (08/31/90)

> Have read(2) and write(2) calls map the pages containing the buffers
> out of the user address space and return immediately.  Once the
> data have been copied (DMAed?) to/from the buffers, map the pages back in.

  Well,  I suspect that this solution will not work with most C programs.
Why, because most of the time read(2) and write(2) are called with a
buffer that is an auto-variable or stack allocated.  Telling the MMU to
unmap a page in the users stack will have very dire consequences.  Maybe
a better idea would be a new memory allocation, combined with new read(2)
and write(2) calls.  Using this method the need to copy the users data
would be solved.

   kalloc :  allocate pages in kernal memory
   kwrite :  perform write without copying memory into kernal data area
   kread  :  perform read without copying memory into kernal data area


  You might want to investigate the POSIX working group P1003.4 which
deals with real-time issues.  They are creating the next generation
UNIX standard that will support threads, async_io, real-time scheduling
and other real-time issues.

utoddl@uncecs.edu (Todd M. Lewis) (08/31/90)

In article <31445.26dc0466@ccavax.camb.com> merriman@ccavax.camb.com writes:
>In article <1990Aug29.170931.10853@terminator.cc.umich.edu>, 
>	rsc@merit.edu (Richard Conto) writes:
>
>> 
>> Make it simpler. Have a routine that requests an I/O operation. Another
>> routine that can check it's status. A way of specifying a routine to be
>> called when the I/O operation completes might be yet another option.
>
>Sure sounds like VMS QIO calls.

Sounds like the Amiga's OS to me.  And UNIX doesn't do this?
I'm trying to be a UNIX nut in training, but I keep hearing about
these new tricks that seem to be rather hard to teach the
old dog.  I'd hate to wake up in 5 years and realize that UNIX
had become to workstations what MS-DOS is to PCs now.  Somebody
pinch me.

meissner@osf.org (Michael Meissner) (08/31/90)

In article <960022@hpcljms.HP.COM> brians@hpcljms.HP.COM (Brian
Sullivan) writes:

| > Have read(2) and write(2) calls map the pages containing the buffers
| > out of the user address space and return immediately.  Once the
| > data have been copied (DMAed?) to/from the buffers, map the pages back in.
| 
|   Well,  I suspect that this solution will not work with most C programs.
| Why, because most of the time read(2) and write(2) are called with a
| buffer that is an auto-variable or stack allocated.  Telling the MMU to
| unmap a page in the users stack will have very dire consequences.  Maybe
| a better idea would be a new memory allocation, combined with new read(2)
| and write(2) calls.  Using this method the need to copy the users data
| would be solved.

What are you talking about?  How is unmapping a page because of an I/O
request (with an intent to map it back in when the I/O completes and
having page faults wait for the I/O completion) any different from any
normal time the OS unmaps a page to reuse the memory for something
else.  That is not a dire consequence, that is how virtual memory
works.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Do apple growers tell their kids money doesn't grow on bushes?

pplacewa@bbn.com (Paul W Placeway) (09/01/90)

peter@ficc.ferranti.com (Peter da Silva) writes:

< ... Disk errors are a very rare occurrence, and almost always require
< human intervention anyway. Any other return value for read is known
< ahead of time.

Unless your "disk" is RFS and the remote machine crashes, or soft
mounted NFS and any one of about a zillion things happens...


		-- Paul Placeway

eliot@dg-rtp.dg.com (Topher Eliot) (09/01/90)

In article <12023:Aug3017:24:1590@kramden.acf.nyu.edu>,
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
|> 
|> > Also the read() and write() functions return the number of
characters
|> > read or written.  How do you know this before the read() or
write()
|> > completes?  Do you assume that all disks are error free and never
|> > fail?  That is a poor assumption!
|> 
|> So what? read() and write() already return before data gets written
out
|> to disk; assuming that you see all I/O errors before a sync is the
poor
|> assumption! This is irrelevant to the issue at hand.

I think this point very well covers the case of writing:  if you want
to be sure it really got to disk, you need to do an fsync(), and even
then I'm not sure you can be sure (doesn't the fsync just *mark* the
buffers for writing out?).  We could certainly arrange for fsync to
block until everything is really on disk.

But consider the reading case.  Here, we could tell the process how
many bytes it had "gotten" (was going to get), even if it was less
than the process had requested (presumably the kernel knows how big
the file is, without having to read all those bytes off disk).  The
application then might do something *other than examining the bytes
it just "read"* based on this knowledge of a "successful read".  If
the disk then fails (or the net fails, or whatever), the application
would have acted incorrectly.  Moreover, instead of the application
learning about the failure by getting -1 back from a read call, it
will learn about it by receiving a signal or some such.

So, can anyone think of an application that behaves in this manner
(i.e.  acts upon the return value from a read by doing something
important, that does not involve examining the read buffer)?  I can't.
Perhaps more significant is the issue of the application not getting a
-1 back from the read call.

--
Topher Eliot
Data General Corporation                eliot@dg-rtp.dg.com
62 T. W. Alexander Drive               
{backbone}!mcnc!rti!dg-rtp!eliot
Research Triangle Park, NC 27709        (919) 248-6371
Obviously, I speak for myself, not for DG.

hunt@dg-rtp.dg.com (Greg Hunt) (09/01/90)

In article <1990Aug31.190751.12522@dg-rtp.dg.com>, eliot@dg-rtp.dg.com
(Topher Eliot) writes:
>
> But consider the reading case.  Here, we could tell the process how
> many bytes it had "gotten" (was going to get), even if it was less
> than the process had requested (presumably the kernel knows how big
> the file is, without having to read all those bytes off disk).  The
> application then might do something *other than examining the bytes
> it just "read"* based on this knowledge of a "successful read".  If
> the disk then fails (or the net fails, or whatever), the application
> would have acted incorrectly.  Moreover, instead of the application
> learning about the failure by getting -1 back from a read call, it
> will learn about it by receiving a signal or some such.
> 
> So, can anyone think of an application that behaves in this manner
> (i.e.  acts upon the return value from a read by doing something
> important, that does not involve examining the read buffer)?  I
can't.
> Perhaps more significant is the issue of the application not getting
a
> -1 back from the read call.
> 
> Topher Eliot
>

Yes, I can.  Under Data General's AOS/VS OS, I wrote a program that
read blocks from tape drives and checked the sizes of the blocks read.
The program was for verifying that labeled backup tapes were
physically readable.

The header label on the tape contained buffersize information which
the program used to read the data blocks.  As each block was read,
the size of the read returned by the OS was checked against the
buffersize to ensure that full buffer reads were done.  It also
counted the number of blocks read.  It discarded the contents of the
read buffer without looking at them.

The trailer label on the tape contained block count information that
was written by the OS.  The OS's block count was compared against the
block count seen by the program.

All of these checks were only to ensure that the tape could be
physically read.  Using it eliminated ALL bad backup tapes that I was
encountering.  Sometimes I found that I could write a tape, but not
read it again.  Tapes that could not be verified by this program were
discarded.  The program did nothing to ensure that the tape could
be logically read by the load program used to restore files, so it
did nothing to guard against bugs in the dump/load programs
themselves.

I have yet to port the program to DG/UX to verify UNIX backup tapes in
a similar manner.  I believe the program could be made to serve a
similar purpose, but I'd probably have to change the header/trailer
handling since AOS/VS uses ANSI standard tape labels and I don't think
that UNIX does.

Does this example meet the behavior that you were wondering about?  It
may be a specialized use of the results of a read and not be
representative of what applications-level software does.

--
Greg Hunt                        Internet: hunt@dg-rtp.dg.com
DG/UX Kernel Development         UUCP:     {world}!mcnc!rti!dg-rtp!hunt
Data General Corporation
Research Triangle Park, NC       These opinions are mine, not DG's.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (09/01/90)

In article <1990Aug31.190751.12522@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes:
> I think this point very well covers the case of writing:  if you want
> to be sure it really got to disk, you need to do an fsync(), and even
> then I'm not sure you can be sure (doesn't the fsync just *mark* the
> buffers for writing out?).  We could certainly arrange for fsync to
> block until everything is really on disk.

fsync() will certainly do that, independently of this mechanism. (It's
sync() that just marks buffers for writing. BSD's fsync() truly writes
the data to disk, giving the transaction control you need for reliable
databases. I have no idea what you poor System V folks do.)

> But consider the reading case.
  [ what happens upon failure? ]

As Peter pointed out, this case is fatal. How many disk errors have you
had over the last year? How many did the programs involved recover from?
Yeah, thought so.

I guess you're right in principle: Steve's proposal is only completely
transparent for writing (which is the more important case anyway).

---Dan

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/02/90)

In article <960022@hpcljms.HP.COM>, brians@hpcljms.HP.COM (Brian Sullivan) writes:
>   Well,  I suspect that this solution will not work with most C programs.
> Why, because most of the time read(2) and write(2) are called with a
> buffer that is an auto-variable or stack allocated.

Surely, most of the time read(2) and write(2) are called by the <stdio>
implementation, and _that_ uses heap-allocated buffers.  You _can_ make
a stdio stream use a stack-allocated buffer, but it requires an explicit
call to setbuf() or one of its relatives.
-- 
You can lie with statistics ... but not to a statistician.

dag@gorgon.uucp (Daniel A. Glasser) (09/04/90)

[was x-posted in comp.unix.wizards and comp.lang.misc]

In article <1990Aug31.142906.26633@uncecs.edu> utoddl@uncecs.edu (Todd M. Lewis) writes:
>In article <31445.26dc0466@ccavax.camb.com> merriman@ccavax.camb.com writes:
>>Sure sounds like VMS QIO calls.
>Sounds like the Amiga's OS to me.  And UNIX doesn't do this?

Well, standard Unix doesn't do this.  VMS always has, so has RSX-11, and
to some degree, even RT-11.  The Amiga OS is a rather recent invention, and
appears to use this old tried and true technique.

For those not familiar with the QIO concept, a read or write QIO is issued
with a special call that allows the application to proceed and be notified
by either a signal (AST in VMS technology) or an event flag.  When the
application needs to use the buffer it either checks the flag or waits until
the signal routine is executed.  Another QIO can be used to determine the
number of bytes read/written, the error status, etc.

This extends beyond doing solicited buffered I/O to unsolicited I/O.
In Unix, when a program wants to do processing in the background while
waiting for keyboard input, it either forks a child to do one or the
other, or polls the tty for input.  Under RSX or VMS, the program attatches
to the keyboard, specifies an unsolicited input Asynch. System Trap (AST)
service routine and goes about its business.  When the user types a key,
the main thread of execution is suspended and the AST service routine is called
to handle the keyboard input.  When done, the AST service routine returns
and the interrupted background processing continues as if nothing has happened.
When you get the hang of it, this is a very simple way to write programs.
I miss this capability in Unix, but have simulated it many times with
fork and signal.

As an extension to Unix, I would suggest the following:

  int writea(int fileno, void *buffer, size_t buflen, void (*complete_rtn)());
  int reada(int fileno, void *buffer, size_t buflen, void (*complete_rtn)());
	where fileno is the file number, buffer is a pointer to the
	bytes to be written/read, buflen is the number of bytes to
	be written/read from/to the buffer, and complete_rtn points
	to a function to be called when the read/write is complete.
	Maybe there should be an additional parameter or two -- flags
	modifying the actions, and a parameter to be passed to the
	completion routines.  The completion routines should take as
	a parameter the result of the write/read.  These routines return
	a status which indicates whether the async. read/write request
	is valid or not.  Care should be taken not to read/write
	the buffer area until the completion routine is called.

Since these are not changes to the existing read/write system calls,
dusty decks would not be broken.  They could also not take advantage
of the new functionality.  (The above proposal is close to the RT-11
completion routine scheme.)

If this debate goes on much longer, I'd suggest that it get removed
from comp.lang.misc, since this is not a language depenent issue.
-- 
Daniel A. Glasser                       One of those things that goes
dag%gorgon@persoft.com                  "BUMP! (ouch!)" in the night.

peter@ficc.ferranti.com (Peter da Silva) (09/05/90)

In article <59247@bbn.BBN.COM> pplacewa@bbn.com (Paul W Placeway) writes:
> peter@ficc.ferranti.com (Peter da Silva) writes:
> < ... Disk errors are a very rare occurrence, and almost always require
> < human intervention anyway. Any other return value for read is known
> < ahead of time.

> Unless your "disk" is RFS and the remote machine crashes, or soft
> mounted NFS and any one of about a zillion things happens...

This is an optimisation. You don't have to enable it for network
file systems, or raw disk devices, or whatever else. Most programs
can't recover from a disk fault, or a network fault, so blowing the
program away when you fault to the bogus page is perfectly legitimate.
Programs that *can* recover can either be run in a lower performance
mode or be rewritten to handle this error.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

peter@ficc.ferranti.com (Peter da Silva) (09/05/90)

In article <1990Aug31.190751.12522@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes:
> So, can anyone think of an application that behaves in this manner
> (i.e.  acts upon the return value from a read by doing something
> important, that does not involve examining the read buffer)?  I can't.

Any program that reads a whole file at a time, like GNU Emacs or TCL.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com