[comp.unix.wizards] Asynchronous I/O under UNIX

peterson@crash.cts.com (John Peterson) (12/19/89)

I am developing a code to LU factor large symmetric matrices on an FPS
522 ( running BSD 4.3 ) which are too large to fit in RAM memory. The
mechanics of "out of core" algorithms are pretty straight forward, but
to be efficient, the I/O should be done asynchronously. There seems to
be no real painless way to do this under UNIX.

   My collegues and I have worked out a rough sketch of a way of doing
asynchronous I/O. One would fork off a copy of your process, the child
would 'nap' until an I/O request came from the parent. Upon receipt of
an I/O request, the child goes off and issues a synchronous I/O request
like one ordinarily does, and then set a flag of some sort when the I/O
has completed. The data to be moved would be stored in memory accessible
to the parent and child processes, probably using System V shared memory.

   Does anyone have a better scheme for doing this? Has any general
implementation of this been posted to any of the other newsgroups?
e-mail responses would be best way to reply.

Thanks in advance, John P.

-- 
+--------------------------------------+
|  John C. Peterson                    |
|  UUCP: { nosc ucsd }!crash!peterson  |
|  ARPA: crash!peterson@nosc.mil       |
+--------------------------------------+

m5@lynx.uucp (Mike McNally) (12/20/89)

peterson@crash.cts.com (John Peterson) writes:

>   My collegues and I have worked out a rough sketch of a way of doing
>asynchronous I/O. One would fork off a copy of your process, the child
>would 'nap' until an I/O request came from the parent. Upon receipt of
>an I/O request, the child goes off and issues a synchronous I/O request
>like one ordinarily does, and then set a flag of some sort when the I/O
>has completed. The data to be moved would be stored in memory accessible
>to the parent and child processes, probably using System V shared memory.

Sounds OK to me, if you're willing to swallow the cost of starting a
new process for each I/O transaction.  Of course, when the world gets
"threads" or "lightweight tasks" or whatever the current buzzword is,
this gets much cheaper; in fact, if I had threads, I don't think I'd
want or need a separate kernel-supported async I/O mechanism.

-- 
Mike McNally                                    Lynx Real-Time Systems
uucp: {voder,athsys}!lynx!m5                    phone: 408 370 2233

            Where equal mind and contest equal, go.

martin@mwtech.UUCP (Martin Weitzel) (12/20/89)

In article <932@crash.cts.com> peterson@crash.cts.com (John C. Peterson) writes:
>
>I am developing a code to LU factor large symmetric matrices on an FPS
>522 ( running BSD 4.3 ) which are too large to fit in RAM memory. The
>mechanics of "out of core" algorithms are pretty straight forward, but
>to be efficient, the I/O should be done asynchronously. There seems to
>be no real painless way to do this under UNIX.
[rest deleted]

By it's pure nature I/O to disk under UNIX is often more 'asynchronous'
than it appears to be:
- A write system call returns, if the blocks have been placed in
  the disk cache (Rochkind's book "Advanced UNIX Programming" has
  a very nice little section about this on page 29 - read and
  enjoy). The physical I/O may well overlap with CPU-activity.
- A read system call starts some machinery in the kernel which
  tries to decide, if sequential reads are made and if this seems
  to be true, carries out 'read ahead's, which often yields in the
  same effects, as asynchronous reads.

Of course, sometimes a progam can decide more efficiently than
the kernel, especially if reads are not done sequentially, but
it often seems, that UNIX programmers want to take too much
responsibility for efficient I/O. (This is more a general remark
than a criticism of the ideas of the poster).

KEEP IN MIND: These are not any longer the days of 'single job
batch processing' where it was on the programmers responsibility
to keep the cpu and the disk equally loaded. The UNIX kernel
does quite a well job in this respect - may be not allways the
best but far better than many programmers think!
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

jas@postgres.uucp (James Shankland) (12/22/89)

In article <6679@lynx.UUCP> m5@lynx.uucp (Mike McNally) writes:
>peterson@crash.cts.com (John Peterson) writes:

>... in fact, if I had threads, I don't think I'd
>want or need a separate kernel-supported async I/O mechanism.

Yes!  I strongly agree, and it sure is nice to hear someone else taking
this position.  I consider asynchronous I/O a hack to compensate for the
absence of sufficiently lightweight processes.  Of course, one person's
spartan elegance is another's semantic impoverishment, but I consider
it a cleaner, clearer programming model.

jas

archer@elysium.esd.sgi.com (Archer Sully) (12/28/89)

In article <20880@pasteur.Berkeley.EDU>, jas@postgres.uucp (James
Shankland) writes:
> In article <6679@lynx.UUCP> m5@lynx.uucp (Mike McNally) writes:
> >peterson@crash.cts.com (John Peterson) writes:
> 
> >... in fact, if I had threads, I don't think I'd
> >want or need a separate kernel-supported async I/O mechanism.
> 
> Yes!  I strongly agree, and it sure is nice to hear someone else taking
> this position.  I consider asynchronous I/O a hack to compensate for the
> absence of sufficiently lightweight processes.  Of course, one person's
> spartan elegance is another's semantic impoverishment, but I consider
> it a cleaner, clearer programming model.
> 
> jas

I've done this on IRIX (Silicon Graphics hacks....er IMPROVEMENTS to
SYSV :-), and it works ok.  Benefit turns out to be highly dependant 
on exactly what you are doing, and there are, of course, lots of funky
limitations, but it does work.

If anyone wants it I may even be able to dig it up.

Archer Sully 	  | A Mind is a Terrible thing to Taste
(archer@sgi.com)  |		- Ministry

lm@snafu.Sun.COM (Larry McVoy) (12/29/89)

peterson@crash.cts.com (John Peterson) writes:
>   My collegues and I have worked out a rough sketch of a way of doing
>asynchronous I/O. One would fork off a copy of your process, the child
>would 'nap' until an I/O request came from the parent. Upon receipt of
>an I/O request, the child goes off and issues a synchronous I/O request
>like one ordinarily does, and then set a flag of some sort when the I/O
>has completed. The data to be moved would be stored in memory accessible
>to the parent and child processes, probably using System V shared memory.

Yeah, this will work.  A couple things to note:

(1) This is a bad idea for writes, especially under SunOS 4.x.  See
    (2), (3), (4) below.

    It's a great idea for reads.  Especially if you do it right.  I would
    keep a pool of processes around - i.e., don't do a fork per read,
    do a fork iff you haven't got someone hanging around (forks are not
    cheap, contrary to popular opinion).  Also, let read ahead work for
    you. Oh, yeah, do yourself a favor and valloc() your buffers rather
    than allocating space off the stack.  It won't help you now, but
    I'm looking at ways of making I/O go fast and one game I can play
    will only work if you give me a page aligned buffer.  And use mmap()
    if you can.  It's much nicer than sys5 shm and it's in 5.4.

(2) Writes are already async, especially so on SunOS 4.x.  I think it
    is limited by segmap, which is around 4megs.  On buffer cache Unix's,
    you'll be limited to the size of the buffer cache (no kidding) which
    is fairly small, around 10-20% of mem.

(3) Having lots of outstanding writes doesn't buy you very much.  In fact,
    it can really lead to weird behavior.  Everyone should know that (on
    simple controllers, at least) writes go through disk sort.  Including
    synchronous writes (NFS is a heavy user of sync writes).  Well, given 
    that you go through disk sort, you won't ever get to starvation (i.e.,
    a buffer will get written out) but you can get to something I call 
    being very hungry.  Suppose you have a disk queue that starts out
    with requests for cyl 0 and 100.  Then suppose you do a series of
    writes onto cyls >=0 but < 100.  The buffer waiting for cyl 100
    will wait until all of those i/o's (that came in after it did)
    complete.  That buffer waiting for 100 is in the "hungry" state.

    Fortunately, this doesn't happen very often.  Traces I've taken indicate
    that disk requests (due to the BSD fs) are nicely grouped.  You have to
    have lots of busy processes doing unrelated I/O to get into this state.
    I suspect the async i/o could hit this problem.

(4) Those outstanding writes cost memory.  You have to grab the users data
    before saying "I'm done".  SunOS 4.x claims this is a feature "Our
    writes finish faster than your writes, especially for big ones" seems
    to be the party line.  Well, for what I do this is a waste of mem
    so I run a hacked version of ufs that limits outstanding writes
    (mail me if you have src and want to try this - it's trivial to
    implement and tunable.  I'd be interested in outside comments).

(4) Reads could work really well.

	 What I say is my opinion.  I am not paid to speak for Sun.

Larry McVoy, Sun Microsystems                          ...!sun!lm or lm@sun.com

rec@dg.dg.com (Robert Cousins) (12/29/89)

I don't understand what all of the difficulty is.  Several UNIX 
and similar operating systems support both sync and async file I/O.
In fact, for those of you who are intersted, DG/UX does support
both synchronous and asynchronous I/O.  Furthermore, the semantics are
such that it is straightforward to use these features without many of
the shared sync/async gotchas.

Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for myself alone.