[comp.lang.misc] Query

corre@csd4.csd.uwm.edu (Alan D Corre) (08/15/90)

Altho I have been involved in programming for quite a while, my background
is humanistic, so please excuse this query if it displays a certain
ignorance.

I should like to know the following. If I am writing a program which
processes a long text, in terms of (1) program efficiency and (2) wear and
tear on disk drives, does it matter if I read a line from a disk file,
process it, and read it out to an output file, as opposed to reading a whole
bunch of lines into memory, processing them and then reading them out en
bloc to the output file. 
In other words, which is preferable:
(1)
while not end of input file do {
  read a line into a variable
  ...process the line...
  write the line to output file}
(2)
set up a list or array of, say, 100 elements
for 100 times
  read a line into the array
...process the array..
for 100 times
  write the array elements to output file

..or doesn't it make any difference?

Thank you.
--
Alan D. Corre
Department of Hebrew Studies
University of Wisconsin-Milwaukee                     (414) 229-4245
PO Box 413, Milwaukee, WI 53201               corre@csd4.csd.uwm.edu

EAF@.Prime.COM (08/15/90)

It depends on the way in which your computer buffers data.  Many computers
double buffer data.  That is they read it in as disk blocks and keep it
in main memory.  When you ask for the next sentence, they get it from the
block of memory which contains the whole disk block, avoiding I/O.

If double buffering is not used on your computer, you would save time as
well as wear and tear by emulating double buffering.


Edward A. Feustel                |     efeustel@primerd.prime.com
Prime Computer                   |     eaf@res-c4.prime.com.xa
500 Old Connecticut Path         |     mit-eddie!primerd!efeustel
Framingham, Ma. 01701-4548       |     (508)-879-2960 x3846

zenith-steven@cs.yale.edu (Steven Ericsson Zenith) (08/15/90)

In article <126800007@.Prime.COM>, EAF@.Prime.COM writes:
|> 
|> It depends on the way in which your computer buffers data.  Many computers
|> double buffer data.  That is they read it in as disk blocks and keep it
|> in main memory.  When you ask for the next sentence, they get it from the
|> block of memory which contains the whole disk block, avoiding I/O.

What you describe here isn't "double buffering", it's just plain old
"buffering". "Double buffering", as I understand it, enables data to 
be read and written concurrently - that is, whilst one buffer is being
read another can be filled. Useful if you want to keep your DMA engines
busy. Thus far from avoiding I/O it allows computation and I/O to overlap. 

--
Steven Ericsson Zenith              *            email: zenith@cs.yale.edu
Fax: (203) 466 2768                 |            voice: (203) 432 1278
"The tower should warn the people not to believe in it." - P.D.Ouspensky
Yale University Dept of Computer Science 51 Prospect St New Haven CT 06520 USA

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/16/90)

In article <126800007@.Prime.COM>, EAF@.Prime.COM writes:
> It depends on the way in which your computer buffers data.  Many computers
> double buffer data.  That is they read it in as disk blocks and keep it
> in main memory.

That is *NOT* double buffering.  That's buffering, pure and simple.
Multiple buffering is when you have N buffers in memory (N > 1) and
the operating system reads ahead, so that by the time you finish working
on one buffer-full the next buffer-full has already been read for you
without waiting, or writes behind, so that it is still writing out one
buffer while you are filling the next (anyone remember LOCATE mode?).

It's not a function of computers, either.  It's a function of (operating
systems) and (programing language runtime support libraries).  The "stdio"
package under UNIX typically does single buffering (a struct _iob contains
one pointer to a buffer) with synchronous reads, however some versions of
UNIX will do read-ahead and/or write-behind so that you get _some_ of the
benefit of double buffering.

The bottom line for the original poster is

	Use the high-level I/O operations provided in your programming
	language: READ and WRITE statements in Fortran, READ, READLN,
	WRITE, WRITELN, GET, PUT in Pascal, any <stdio.h> function or
	macro in C, and

		Don't Panic!

	it's Someone Else's Problem (the programming language vendor's).

-- 
The taxonomy of Pleistocene equids is in a state of confusion.

EAF@.Prime.COM (08/16/90)

I should have been more careful in separating different OS aspects.

Typically a good I/O system will do some buffering for you.  Most competent
operating systems maintain a cache of disk blocks in memory for you.  In
addition if you employ language oriented I/O, a buffer will be maintained
for you in the I/O library's storage pool.  When you read a sentence into
your buffer, it can come from the Language Library buffer.  If your
language I/O library is intelligent and you are reading sequential
data, the language library will call on the OS to read the next disk
block into memory, often before it is required.  When the new block is
needed it will be brought from the cache into the language library buffer
and then into your sentence space.

While this is not double buffering in the sense that a single entity is
aware of it and planning for it, it is double buffering and it does improve
I/O concurrency.

jlg@lanl.gov (Jim Giles) (08/17/90)

From article <126800008@.Prime.COM>, by EAF@.Prime.COM:
> [...]
>                                                             If your
> language I/O library is intelligent and you are reading sequential
> data, the language library will call on the OS to read the next disk
> block into memory, often before it is required.
> [...]

Not on UNIX it won't.  There is no system call for the library to use
which will start a read request and then return control to the library
while the read is asynchronously processing.  So, there's no point in
the library trying to read ahead, the system will cause the same delay
(statistically - there are variations independent of a given process)
whether you read ahead or not.

This is one of the most glaring deficiencies of UNIX.

J. Giles

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/18/90)

In article <60345@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> From article <126800008@.Prime.COM>, by EAF@.Prime.COM:
> > ..., the language library will call on the OS to read the next block ...
> 
> Not on UNIX it won't.  There is no system call for the library to use
> which will start a read request and then return control to the library
> while the read is asynchronously processing.

> This is one of the most glaring deficiencies of UNIX.

This is half true.  See
	fcntl(fd, FASYNCH, ...)
in the programmer's manual.  Alas, there are lots of UNIX systems
that have _partly_ implemented it...

Isn't this one of the things being considered for POSIX?

(In System V, of course, one can always arrange this kind of thing
by putting the file buffers in a shared memory segment and setting
up another process to actually do the reads.  Before trying to do
the same thing with light-weight processes, check the fine print
in your manual about LWP/IO interactions...)
-- 
The taxonomy of Pleistocene equids is in a state of confusion.

peter@ficc.ferranti.com (Peter da Silva) (08/20/90)

In article <60345@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> Not on UNIX it won't.  There is no system call for the library to use
> which will start a read request and then return control to the library
> while the read is asynchronously processing.

No, but the operating system does this for you.

> This is one of the most glaring deficiencies of UNIX.

No, this is a symptom of the general lack of realtime and transaction
processing capabilities which is one of the most glaring deficiencies
of UNIX.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com (currently not working)
peter@hackercorp.com