garry@batcomputer.tn.cornell.edu (Garry Wiegand) (12/18/86)
I have an application which entails computing and churning out vast quantities of data and, for speed, I'd like to have the I/O happening in parallel with the computing. After reading the BSD and SysV manuals, I'm puzzled: does the system give us *any way* to do asynchronous output? I've thought of writing a (presumably buffered) pipe to "cat" and thence to my device. Is there anything else? thanks - [apologies if an incorrect posting on this (I just cancelled it) sneaks through.] garry wiegand (garry%cadif-oak@cu-arpa.cs.cornell.edu)
ggs@ulysses.homer.nj.att.com (Griff Smith) (12/18/86)
In article <1858@batcomputer.tn.cornell.edu>, garry@batcomputer.UUCP writes: > I have an application which entails computing and churning out vast > quantities of data and, for speed, I'd like to have the I/O happening > in parallel with the computing. After reading the BSD and SysV manuals, > I'm puzzled: does the system give us *any way* to do asynchronous output? > > I've thought of writing a (presumably buffered) pipe to "cat" and thence to > my device. Is there anything else? > > garry wiegand (garry%cadif-oak@cu-arpa.cs.cornell.edu) It depends on who else is using the computer, where you are putting your data, etc. If you are writing to a disk file, don't worry; the disk system is buffered internally. If you are writing to a mag tape and you are the only one on the system, you might want to fork a tape writing process. If you are using System V, you could even use shared memory to pass data to the process without writing or copying it. If shared memory isn't available, try a pipe. Remember, though, that pipes aren't free. The time it takes for one process to write the pipe and another to read it may be larger than the time spend waiting for the tape to spin. If the system is heavily loaded, complicated multi-process buffering schemes may slow you down; the system will just run something simpler instead. Your best strategy is usually to minimize your own cpu requirements so you make the best use of the fraction of the cpu allocated to you. Of course, if you have the machine to yourself the only thing that matters is elapsed time, not cpu time. In that case, you might consider running something else useful to soak up the idle time. You still won't get the primary job done any faster, but the total productivity of the system should improve. After an experience I had about 7 years ago I am not very enthusiastic about asynchronous I/O in time-sharing systems. We were using TOPS-10, which had a buffering scheme for asynchronous I/O. After having many problems in the kernel caused by cache maintenance errors during asynchronous I/O, we defeated the asynchronous feature. The system itself was still highly asynchronous, but not within processes. Users didn't know the difference but the system did: improved system stability, better system throughput, faster disk performance, less disk fragmentation.
philip@axis.UUCP (Philip Peake) (12/20/86)
In article <1858@batcomputer.tn.cornell.edu> garry%cadif-oak@cu-arpa.cs.cornell.edu writes: >I have an application which entails computing and churning out vast >quantities of data and, for speed, I'd like to have the I/O happening >in parallel with the computing. After reading the BSD and SysV manuals, >I'm puzzled: does the system give us *any way* to do asynchronous output? > >I've thought of writing a (presumably buffered) pipe to "cat" and thence to >my device. Is there anything else? You don't seem to have read your manuals too well. ALL (normal) I/O activity under UNIX is asynchronous. When you do a write(), the data is copied from the data area of your program into a buffer (or clist structure) in the kernel data space. The write() then returns. The buffered data is output either by a dma transfer or by interrupt driven routines within the kernel, depending upon the device to which you are writing. Philip
mangler@cit-vax.Caltech.Edu (System Mangler) (12/20/86)
In article <1858@batcomputer.tn.cornell.edu>, garry@batcomputer.UUCP writes: > I have an application which entails computing and churning out vast > quantities of data and, for speed, I'd like to have the I/O happening > in parallel with the computing. Many computations go through a "read the data, crunch it, output it" cycle in which the crunching of one block is independent of the crunching of another. If that's your case, use three processes. At any given time, one is reading, another is crunching what it just read, and one is writing. You'll get a fairly continuous flow of data, won't have to pass around large volumes of data (i.e. low overhead), and get a large share of the CPU. One curious gotcha is that throughput will be substantially worse if you run it "nice --20". Explain that, BSD scheduler wizards! (I noticed this on 4.3bsd dump, which works in precisely this fashion). Don Speck speck@vlsi.caltech.edu {seismo,rutgers,ames}!cit-vax!speck