steve@nuchat.UUCP (Steve Nuchia) (08/29/90)
On the subject of asynchronous I/O in Unix: I've come up with what I consider a rather slick way of making it fit neatly into Unix's way of doing things: Have read(2) and write(2) calls map the pages containing the buffers out of the user address space and return immediately. Once the data have been copied (DMAed?) to/from the buffers, map the pages back in. A user program that is not aware of the subterfuge will then run along for some (probably short) time and trap on an attempt to refill or inspect the buffer. It will then be blocked until the request completes. A savvy program will do something else for as long as it can, then take a peek at the buffer when it has run out of busy work. One would probably also provide (grudgingly, in my case) an explicit call for discovering the status. (Note that such a call might be useful to a program that wished to control its paging behaviour if it were written with sufficient generality.) The scheme will only provide asynchronicity in cases where the return value for the read or write call is known early. This will be the case primarily for files, but other cases can be made to take advantage of it. The performance characteristics would be similar to using mmap but would apply to programs written in normal unix style and to dusty decks. One must of course take some care in implementing the scheme, and there are no doubt the usual raft of gotchas that come up when doing anything involving memory management. The case of write(1,buf,read(0,buf,sizeof(buf))) is entertaining to contemplate for instance. Good performance with V7 file system call semantics. Programs work whether the feature is in the kernel or not and whether they are written to take advantage of it or not. I sure wish I had thought of it a long time ago (like while the standards were still soft). I would appreciate any comments that wizards with more kernel internals experience might have. If I've rediscovered something well-known again I think I shall slit my wrists. (For completeness I will note that to use this scheme intelligently you must be able to discover the relevant properties of the memory management implementation. This is nothing new for high performance programs in a paged environment, but unless its been added recently there isn't a standard way to do it. Whether this is properly a language or a system interface issue is best left to another debate.) -- Steve Nuchia South Coast Computing Services (713) 964-2462 "To learn which questions are unanswerable, and _not_to_answer_them; this skill is most needful in times of stress and darkness." Ursula LeGuin, _The_Left_Hand_of_Darkness_
rsc@merit.edu (Richard Conto) (08/30/90)
In article <27619@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: >On the subject of asynchronous I/O in Unix: I've come up with >what I consider a rather slick way of making it fit neatly >into Unix's way of doing things: > >Have read(2) and write(2) calls map the pages containing the buffers >out of the user address space and return immediately. Once the >data have been copied (DMAed?) to/from the buffers, map the pages back in. > >A user program that is not aware of the subterfuge will then run >along for some (probably short) time and trap on an attempt to >refill or inspect the buffer. It will then be blocked until >the request completes. A savvy program will do something else >for as long as it can, then take a peek at the buffer when it >has run out of busy work. One would probably also provide >(grudgingly, in my case) an explicit call for discovering the status. A buffer is not necessarily aligned on a page boundary. And a page may contain more than one variable. The savvy program would have to design it's data structures (including local variable arrangement, if a buffer happens to be there) to be aware of whatever peculiar way the complier lays out variables and whatever peculiar granularity the OS has for pages. Make it simpler. Have a routine that requests an I/O operation. Another routine that can check it's status. A way of specifying a routine to be called when the I/O operation completes might be yet another option. I'm afraid that your idea adds unnecessary complexity (and system dependancies). And using constructs like 'write(fdout,buf,read(fdin,sizeof(buf), buf))' is asking for trouble when 'read()' returns an error condition. --- Richard
jlg@lanl.gov (Jim Giles) (08/30/90)
From article <27619@nuchat.UUCP>, by steve@nuchat.UUCP (Steve Nuchia): > On the subject of asynchronous I/O in Unix: I've come up with > what I consider a rather slick way of making it fit neatly > into Unix's way of doing things: > > Have read(2) and write(2) calls map the pages containing the buffers > out of the user address space and return immediately. Once the > data have been copied (DMAed?) to/from the buffers, map the pages back in. > [...] Yes, this will work. I believe that MACH already does this. Unfortunately, this idea has two problems: 1) not all machines are paged/segmented; 2) not all I/O requests are a multiple of the pagesize. The first problem is more severe - hardware designers avoid pages/segments when designing for speed. The extra hardware overhead is 10% speed or about that for extra hardware cost. So they are avoided (Crays don't have pages or segments). The pagesize problem just means that you'd have to map out more memory than is actually involved in the I/O request. This means that the user might get blocked on memory that is really perfectly safe to access - a minor source of slowdown. J. Giles
merriman@ccavax.camb.com (08/30/90)
In article <1990Aug29.170931.10853@terminator.cc.umich.edu>, rsc@merit.edu (Richard Conto) writes: > > Make it simpler. Have a routine that requests an I/O operation. Another > routine that can check it's status. A way of specifying a routine to be > called when the I/O operation completes might be yet another option. Sure sounds like VMS QIO calls.
bdsz@cbnewsl.att.com (bruce.d.szablak) (08/30/90)
In article <1990Aug29.170931.10853@terminator.cc.umich.edu>, rsc@merit.edu (Richard Conto) writes: > In article <27619@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: > >Have read(2) and write(2) calls map the pages containing the buffers > >out of the user address space and return immediately. > > A buffer is not necessarily aligned on a page boundary. And a page > may contain more than one variable. Actually, the OS only has to mark the pages as copy on write. This sort of thing is often done when a process forks to avoid making a copy of the data space for the child. Whether its worth it is another matter.
lfd@cbnewsm.att.com (leland.f.derbenwick) (08/31/90)
In article <27619@nuchat.UUCP>, steve@nuchat.UUCP (Steve Nuchia) writes: > On the subject of asynchronous I/O in Unix: I've come up with > what I consider a rather slick way of making it fit neatly > into Unix's way of doing things: > > Have read(2) and write(2) calls map the pages containing the buffers > out of the user address space and return immediately. Once the > data have been copied (DMAed?) to/from the buffers, map the pages back in. > > A user program that is not aware of the subterfuge will then run > along for some (probably short) time and trap on an attempt to > refill or inspect the buffer. It will then be blocked until > the request completes. A savvy program will do something else > for as long as it can, then take a peek at the buffer when it > has run out of busy work. One would probably also provide > (grudgingly, in my case) an explicit call for discovering the status. Apart from the implementation problems that others have mentioned, _this suggestion breaks existing code_. In essentially any serious database application, a completed write() to a raw disk is treated as a guarantee that the data block has been _physically written to the device_. (This is needed to ensure reliable transaction behavior in the presence of potential system crashes.) Since your suggestion would void that guarantee, it is not benign. On the other hand, I like your idea of implementing asynchronous behavior using the ordinary read() and write() calls. So how difficult would it be to add a couple ioctl's to the existing raw disk driver to support that? One ioctl would select sync/async reads/writes (the default would be the present behavior: sync read, sync write). The other ioctl would do the status inquiry. With these, asynchronous behavior is available on demand, and the OS doesn't need to jump through any hoops to make it transparent: it's up to the user to use the facility properly. This is a lot cleaner than implementing asynchronous I/O in user mode with shared memory and a background process... -- Speaking strictly for myself, -- Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ -- lfd@cbnewsm.ATT.COM or <wherever>!att!cbnewsm!lfd
utoddl@uncecs.edu (Todd M. Lewis) (08/31/90)
In article <31445.26dc0466@ccavax.camb.com> merriman@ccavax.camb.com writes: >In article <1990Aug29.170931.10853@terminator.cc.umich.edu>, > rsc@merit.edu (Richard Conto) writes: > >> >> Make it simpler. Have a routine that requests an I/O operation. Another >> routine that can check it's status. A way of specifying a routine to be >> called when the I/O operation completes might be yet another option. > >Sure sounds like VMS QIO calls. Sounds like the Amiga's OS to me. And UNIX doesn't do this? I'm trying to be a UNIX nut in training, but I keep hearing about these new tricks that seem to be rather hard to teach the old dog. I'd hate to wake up in 5 years and realize that UNIX had become to workstations what MS-DOS is to PCs now. Somebody pinch me.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (09/01/90)
In article <1990Aug30.222226.20866@cbnewsm.att.com> lfd@cbnewsm.att.com (leland.f.derbenwick) writes: > In article <27619@nuchat.UUCP>, steve@nuchat.UUCP (Steve Nuchia) writes: > > Have read(2) and write(2) calls map the pages containing the buffers > > out of the user address space and return immediately. Once the > > data have been copied (DMAed?) to/from the buffers, map the pages back in. > Apart from the implementation problems that others have mentioned, > _this suggestion breaks existing code_. No, it does not. On many paged machines, an implementation of Steve's suggestion takes virtually [sic] no time. The worst that happens is your original efficiency. The best that happens is a noticeable speedup, especially of pipe read-writes. A program that uses, say, a getpage() call to allocate a page-aligned buffer can guarantee the best case. > In essentially any serious database application, a completed > write() to a raw disk is treated as a guarantee that the data > block has been _physically written to the device_. No. Any database application that claims to recover after crashes without fsync()ing its write()s is lying. (This says some interesting things about certain System V database programs.) ---Dan
meissner@osf.org (Michael Meissner) (09/01/90)
In article <29290:Aug3120:10:5590@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: | No. Any database application that claims to recover after crashes | without fsync()ing its write()s is lying. (This says some interesting | things about certain System V database programs.) Ah, but System V has O_SYNC was does do the fsync after every write (or so the man page claims...). -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Do apple growers tell their kids money doesn't grow on bushes?
bzs@world.std.com (Barry Shein) (09/01/90)
Do there exist any benchmark or other test results which indicate that adding asynch i/o to unix actually yields a performance improvement? Papers, pointers, etc appreciated. I am not interested in results from other operating systems, I don't believe they would have any applicability to the question. However, very informal results would be appreciated. -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
stripes@eng.umd.edu (Joshua Osborne) (09/02/90)
In article <1990Aug30.222226.20866@cbnewsm.att.com> lfd@cbnewsm.att.com (leland.f.derbenwick) writes: >Apart from the implementation problems that others have mentioned, >_this suggestion breaks existing code_. > >In essentially any serious database application, a completed >write() to a raw disk is treated as a guarantee that the data >block has been _physically written to the device_. (This is >needed to ensure reliable transaction behavior in the presence >of potential system crashes.) Since your suggestion would void >that guarantee, it is not benign. Then that program is quite broken. Unix guarantees no such thing. If you want it you need to use fsync(filno), or open the file in sync mode. Currently Unix copys data to write into it's disk buffers, returns controll to the user and doesn't write them until it is forced to (sync, fsync, buffer shortage) or decides that it is a good time to write. >One ioctl would select sync/async reads/writes (the default would >be the present behavior: sync read, sync write). The other ioctl >would do the status inquiry. With these, asynchronous behavior >is available on demand, and the OS doesn't need to jump through >any hoops to make it transparent: it's up to the user to use the >facility properly. The default should be async for both read & write, because the default write is aready async & the async read would be transparent. There should be a way to select sync read/write on a file by file basis 'tho. -- stripes@eng.umd.edu "Security for Unix is like Josh_Osborne@Real_World,The Mutitasking for MS-DOS" "The dyslexic porgramer" - Kevin Lockwood "Isn't that a shell script?" - D. MacKenzie "Yeah, kinda sticks out like a sore thumb in the middle of a kernel" - K. Lidl
chris@mimsy.umd.edu (Chris Torek) (09/02/90)
>In article <1990Aug30.222226.20866@cbnewsm.att.com> lfd@cbnewsm.att.com (leland.f.derbenwick) writes: >>In essentially any serious database application, a completed >>write() to a raw disk is treated as a guarantee that the data >>block has been _physically written to the device_. ... In article <1990Sep1.185221.8718@eng.umd.edu> stripes@eng.umd.edu (Joshua Osborne) writes: >Unix guarantees no such thing. If you want it you need to use >fsync(filno), or open the file in sync mode. Currently Unix copys >data to write into it's disk buffers, .... Look again: he said `raw disk'. Raw I/O calls physio; physio calls vslock (in 4BSD anyway); vslock pages in and locks in core all the memory needed for the transfer; physio calls physstrat or the device strategy routine (depending on the partiuclar variant of 4BSD); physstrat (if it exists) calls the device strategy routine; the device routine queues the transfer and, if necessary, starts the device, then returns; and then physio/physstrat *WAITS*. Finally, physio calls vsunlock (which may also mark the pages as modified) and returns. It would be useful to be able to start raw transfers without waiting. I once (actually, twice) wrote a driver that did this. Sort of a hack, but it worked. It required changes to vsunlock() (to allow it to be called at interrupt time) and exit() (to avoid throwing away the process VM until the device close routines finished up). It would be better to do this more directly, though. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
jc@minya.UUCP (John Chambers) (09/03/90)
In article <BZS.90Aug31173255@world.std.com>, bzs@world.std.com (Barry Shein) writes: > > Do there exist any benchmark or other test results which indicate that > adding asynch i/o to unix actually yields a performance improvement? Well, I don't have the papers any more, and in any case, they were internal documents at the compnay I was working for, but I did a study along this line about 5 years ago. We used an assortment of Sys/V, Sys/III, and BSD systems, including a couple (e.g., Masscomp) that implemented contiguous files. I wrote a set of test programs that did various patterns of file access (sequential, random, random-followed-by-sequential, etc.), and also tested a number of the companies' existing applications. Files were opened with/without O_SYNC, and were contiguous/allocated on different tests The tests were run both alone and together with other applications, giving a total of 8 combinations for each test, which were run long enough to give stastically-significant results. The results were disappointing for those that wanted these features. Most of the tests showed no significant differences among the combinations. In the few cases where there was a difference, the "normal" case (no syncing, not contiguous) was the winner by a small margin. It was particularly interesting that we couldn't find a single application that ran faster with contiguous files than with normal files. I'm sure that some exist, but we couldn't construct them. I was less surprised that automatic syncing didn't benefit anyone; I had predicted that. After all, forcing a block to be written causes it to be marked "clean", so it becomes a good candidate for re-use if buffer space is low. As a result, such files tend to have a somewhat smaller fraction of their data in buffers, so random reads are somewhat less likely to have a hit. It's not big, but for random I/O, it is measurable. It'd be interesting to hear of cases where these features are worth their price (in kernel code, programmer time, etc.). -- Zippy-Says: Imagine ... a world without clothing folds, chiaroscuro, or marital difficulties ... Home: 1-617-484-6393 Work: 1-508-952-3274 Uucp: ...!{harvard.edu,ima.com,eddie.mit.edu,ora.com}!minya!jc (John Chambers) Uucp-map: minya adelie(DEAD)
peter@ficc.ferranti.com (Peter da Silva) (09/06/90)
In article <27813@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: > An excellent point, one we would all do well to keep in mind. I would > have added to Lester's list of examples the event-driven style imposed > by modern user interface construction. ^^^^^^ Event loops are basically single loop control systems, such as are found in the simplest of embedded controllers: microwave ovens, for example. For them to have become synonymous with modern user interfaces borders on the obscene. The best way to implement a modern user interface is with multiple loops of control, such as Haitex' spreadsheet "Haicalc" on the commodore Amiga... Or, for a workstation environment, in NeWS. > AAAARRRRRRRGGGGHHHH!!!!! Sympathy. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com