corre@csd4.csd.uwm.edu (Alan D Corre) (08/15/90)
Altho I have been involved in programming for quite a while, my background is humanistic, so please excuse this query if it displays a certain ignorance. I should like to know the following. If I am writing a program which processes a long text, in terms of (1) program efficiency and (2) wear and tear on disk drives, does it matter if I read a line from a disk file, process it, and read it out to an output file, as opposed to reading a whole bunch of lines into memory, processing them and then reading them out en bloc to the output file. In other words, which is preferable: (1) while not end of input file do { read a line into a variable ...process the line... write the line to output file} (2) set up a list or array of, say, 100 elements for 100 times read a line into the array ...process the array.. for 100 times write the array elements to output file ..or doesn't it make any difference? Thank you. -- Alan D. Corre Department of Hebrew Studies University of Wisconsin-Milwaukee (414) 229-4245 PO Box 413, Milwaukee, WI 53201 corre@csd4.csd.uwm.edu
EAF@.Prime.COM (08/15/90)
It depends on the way in which your computer buffers data. Many computers double buffer data. That is they read it in as disk blocks and keep it in main memory. When you ask for the next sentence, they get it from the block of memory which contains the whole disk block, avoiding I/O. If double buffering is not used on your computer, you would save time as well as wear and tear by emulating double buffering. Edward A. Feustel | efeustel@primerd.prime.com Prime Computer | eaf@res-c4.prime.com.xa 500 Old Connecticut Path | mit-eddie!primerd!efeustel Framingham, Ma. 01701-4548 | (508)-879-2960 x3846
zenith-steven@cs.yale.edu (Steven Ericsson Zenith) (08/15/90)
In article <126800007@.Prime.COM>, EAF@.Prime.COM writes: |> |> It depends on the way in which your computer buffers data. Many computers |> double buffer data. That is they read it in as disk blocks and keep it |> in main memory. When you ask for the next sentence, they get it from the |> block of memory which contains the whole disk block, avoiding I/O. What you describe here isn't "double buffering", it's just plain old "buffering". "Double buffering", as I understand it, enables data to be read and written concurrently - that is, whilst one buffer is being read another can be filled. Useful if you want to keep your DMA engines busy. Thus far from avoiding I/O it allows computation and I/O to overlap. -- Steven Ericsson Zenith * email: zenith@cs.yale.edu Fax: (203) 466 2768 | voice: (203) 432 1278 "The tower should warn the people not to believe in it." - P.D.Ouspensky Yale University Dept of Computer Science 51 Prospect St New Haven CT 06520 USA
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/16/90)
In article <126800007@.Prime.COM>, EAF@.Prime.COM writes: > It depends on the way in which your computer buffers data. Many computers > double buffer data. That is they read it in as disk blocks and keep it > in main memory. That is *NOT* double buffering. That's buffering, pure and simple. Multiple buffering is when you have N buffers in memory (N > 1) and the operating system reads ahead, so that by the time you finish working on one buffer-full the next buffer-full has already been read for you without waiting, or writes behind, so that it is still writing out one buffer while you are filling the next (anyone remember LOCATE mode?). It's not a function of computers, either. It's a function of (operating systems) and (programing language runtime support libraries). The "stdio" package under UNIX typically does single buffering (a struct _iob contains one pointer to a buffer) with synchronous reads, however some versions of UNIX will do read-ahead and/or write-behind so that you get _some_ of the benefit of double buffering. The bottom line for the original poster is Use the high-level I/O operations provided in your programming language: READ and WRITE statements in Fortran, READ, READLN, WRITE, WRITELN, GET, PUT in Pascal, any <stdio.h> function or macro in C, and Don't Panic! it's Someone Else's Problem (the programming language vendor's). -- The taxonomy of Pleistocene equids is in a state of confusion.
EAF@.Prime.COM (08/16/90)
I should have been more careful in separating different OS aspects. Typically a good I/O system will do some buffering for you. Most competent operating systems maintain a cache of disk blocks in memory for you. In addition if you employ language oriented I/O, a buffer will be maintained for you in the I/O library's storage pool. When you read a sentence into your buffer, it can come from the Language Library buffer. If your language I/O library is intelligent and you are reading sequential data, the language library will call on the OS to read the next disk block into memory, often before it is required. When the new block is needed it will be brought from the cache into the language library buffer and then into your sentence space. While this is not double buffering in the sense that a single entity is aware of it and planning for it, it is double buffering and it does improve I/O concurrency.
jlg@lanl.gov (Jim Giles) (08/17/90)
From article <126800008@.Prime.COM>, by EAF@.Prime.COM: > [...] > If your > language I/O library is intelligent and you are reading sequential > data, the language library will call on the OS to read the next disk > block into memory, often before it is required. > [...] Not on UNIX it won't. There is no system call for the library to use which will start a read request and then return control to the library while the read is asynchronously processing. So, there's no point in the library trying to read ahead, the system will cause the same delay (statistically - there are variations independent of a given process) whether you read ahead or not. This is one of the most glaring deficiencies of UNIX. J. Giles
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/18/90)
In article <60345@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: > From article <126800008@.Prime.COM>, by EAF@.Prime.COM: > > ..., the language library will call on the OS to read the next block ... > > Not on UNIX it won't. There is no system call for the library to use > which will start a read request and then return control to the library > while the read is asynchronously processing. > This is one of the most glaring deficiencies of UNIX. This is half true. See fcntl(fd, FASYNCH, ...) in the programmer's manual. Alas, there are lots of UNIX systems that have _partly_ implemented it... Isn't this one of the things being considered for POSIX? (In System V, of course, one can always arrange this kind of thing by putting the file buffers in a shared memory segment and setting up another process to actually do the reads. Before trying to do the same thing with light-weight processes, check the fine print in your manual about LWP/IO interactions...) -- The taxonomy of Pleistocene equids is in a state of confusion.
peter@ficc.ferranti.com (Peter da Silva) (08/20/90)
In article <60345@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > Not on UNIX it won't. There is no system call for the library to use > which will start a read request and then return control to the library > while the read is asynchronously processing. No, but the operating system does this for you. > This is one of the most glaring deficiencies of UNIX. No, this is a symptom of the general lack of realtime and transaction processing capabilities which is one of the most glaring deficiencies of UNIX. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com (currently not working) peter@hackercorp.com