[net.unix-wizards] stdio buffering considered harmful

dan@bbncd@sri-unix.UUCP (07/22/83)

From:  Dan Franklin <dan@bbncd>

UNIX developers were once justifiably proud of the fact that pipes were an
INTERACTIVE way of coupling programs; if the standard input and output of a
pipeline were the terminal, you could type a line at it and immediately see the
result.  But now that stdio is used everywhere, this feature has been "fixed":
a pipeline is usually NOT interactive.  I found this very annoying when I tried
to use m4 with adb, and ended up adding a flag to m4 to force non-buffering.
Bell must have encountered this problem too, since cat -u forces non-buffering.
But cat -u is clearly wrong (even though Rob Pike blesses it); the problem is
with stdio, not with individual commands, and we should not change all the UNIX
commands that might be used in an interactive pipeline to implement -u.  The
question is, what should be done?  I have just encountered this problem again
with a different program, and I would like to solve it once and for all.  One
solution would be to have stdio test an fd to see if it is a pipe (somehow),
and line-buffer stdout if it is, treating it like a terminal.  This could be
pretty expensive compared to writing in 1kbyte blocks, so it might be better to
leave the default the way it is and provide a way to tell stdio to line-buffer
on demand.  The obvious way to communicate with stdio, without changing every
command, is to use the environment.  Before writing, stdio would look in the
environment to see if there were any special buffering instructions.  Example:
        STDOUT=LINE_BUFFERED m4 | adb
Is this reasonable?  Is there a better way?

	Dan Franklin

kdp@hplabs.UUCP (Ken Poulton) (07/25/83)

Yes!  I remember interactive pipelines and they were very useful
at times.  'cat -u' is certainly a kluge, and an environment 
variable does seem like the right way to control stdio buffering.  
Note that environment variables need not be set at all unless 
you wish to change from the default buffering.

I guess I would set up an alias to do the environment setting, so
I could make a pipe run interactively with less typing:
          % ip m4 | adb
One might even consider having the shell set that environment
variable for you whenever the final STDOUT came to the terminal.

                                    Ken Poulton
                                    ...!hplabs!kdp

noel@cubsvax.UUCP (07/27/83)

I have run into the stdout buffering headache several times, most
painfully when piping from an interactive fortran program.
(Who wants to hack a -u flag into a large f77 program? I reluctantly did
the equivalent).  My problem was when using "script" or "tee", that stdout
wasn't flushed when the program did a read on stdin.  Is there a more general
problem? I suppose in some cases you might need LINE buffering even when
it's a unidirectional communication path too (the pipe reader
wants to read line by line (why??)).

Considering just the "interactive" case.. why couldn't stdio be smart
enough to check for stdout being a pipe, then flush it when stdin is
read?  This way you wouldn't pay the performance penalty of ALWAYS
line buffering on pipes, but fix the (most common?) problem with the current
scheme.  What am I missing?

	Noel Kropf		harpo!rocky2!cubsvax!noel
	1002 Fairchild		philabs!cmcl2!rocky2!cubsvax!noel
	Columbia University
	New York NY 10027	212-280-5517
-- 
	Noel Kropf		harpo!rocky2!cubsvax!noel
	1002 Fairchild		philabs!cmcl2!rocky2!cubsvax!noel
	Columbia University
	New York NY 10027	212-280-5517

dan@bbncd@sri-unix.UUCP (07/29/83)

From:  Dan Franklin <dan@bbncd>

Thanks for the comments.  Since no one came up with anything better than
using the environment, that's what I will do.  In case anyone else is
interested in solving this problem, here's exactly how I plan to do it (when I
finally get around to it).  I plan to have three choices, each set by assigning
a value to STDOUT in the environment:

STDOUT=NBUF     # no buffering
STDOUT=LBUF     # line buffering
STDOUT=BBUF     # block buffering

If there is an explicit setbuf(stdout,...) in the program, it seems like the
environment setting ought to override it, at least in the case where the
environment specifies less buffering than the setbuf.  I plan to have it
override all the time.  The environment variable would also override stdio's
determination of the buffering to use based on isatty().

The same mechanism will also work for STDERR.  (STDERR=BBUF could be handy when
you have a program producing lots of error output, all unbuffered.)

	Dan Franklin

gwyn%brl-vld@sri-unix.UUCP (07/31/83)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

There are two obvious choices to solve the problem of buffered output
not appearing (i.e. being flushed out) before an input operation:

(1)  Write the program so it fflush()es relevant output before trying
the read (this works with all versions of stdio and puts the decision
where it really belongs, in the programmer's lap);

(2)  Write stdio itself so ALL output streams to terminals (and pipes)
are flushed whenever input is attempted from ANY terminal (or pipe).
There is no reason to handle stdin and stdout only, since that only
helps in some cases but does not solve the general problem, if it is
a problem.

Personally I favor alternative (1).  According to UNIX System V manuals
Bell has adopted alternative (2) for terminals only, not pipes.
Berkeley appears to have adopted (2) for stdout/stdio terminals only.

I think programmers ought to learn to program carefully rather than
rely on library code to make up for their sloppiness.

pdl@root44.UUCP (Dave Lukes) (08/01/83)

Agreed: stdio's buffering is painful (BTW: to all of you out there in the
dark ages: SIII and upwards ALSO have an m4 flag to do line buffering (yuk)).

The problem with the environment hack is that it doesn't really offer fine
enough control, (you probably don't want any temp. files etc. being line
buffered just because the standard output is a pipe).
I suppose it could be extended to allow you to say `gimme block buffering
on files, single character buffering on terminals and line buffering on pipes',
but the syntax would no doubt be so horrendous that no-one would use it.

(BTBTW: you can GUESS if it's a pipe by doing a seek on it and seeing if
you get errno == ESPIPE, or, with SIII (rah, rah), you can see if 
((statbuf.st_mode&S_IFMT) == S_IFIFO)).

			'Nuff said (he he)
				Dave Lukes (...!vax135!ukc!root44!pdl).

henry@utzoo.UUCP (Henry Spencer) (08/03/83)

At one point, I believe the folks at Duke implemented an alternative
to normal stdio buffering:  all characters supplied by a single (e.g.)
printf call went out as a single write, but there was no buffering
across calls.  They did this mostly to get the performance of big
writes to ttys without the complexities of trying to guess when to
flush the buffers, but they claimed that it bought most of the speed
of full buffering without the annoyances.  I don't believe they did
this on pipes, but it might be worth considering.  Any notions how
much of a performance penalty it would involve?
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

mjl@ritcv.UUCP (Mike Lutz) (08/04/83)

I disagree with Doug Gwyn's suggestion that programmers should be responsible
for fflush()ing "relevant" output before doing input; what is more, I don't
think it's a question of sloppiness/neatness.  Taken to an extreme, the use of
libraries at all is sloppy (printf is the lazy way of getting around number
conversions, and the window package is a crutch for those too indolent to keep
track of what's on the screen).  I don't think Doug meant to be so pedantic as
to forbid all library routines, but I also believe the issues in an I/O
package's functional design are subtle.

If programmers embed flushing decisions in their code, users are denied the
option of solving the buffering problem THEIR way.  Second, every programmer
has a slightly different notion of when to call fflush(), so the behavior of
commands in a pipeline will be even more unpredictable than now.  (One could
invent "standards" for the use of fflush(), but then why not relieve the
programmer of an unnecessary burden and implement the standard as part of the
library?) Finally, who has the time or patience to go through every program on
the system inserting calls to fflush()?

All of this is not to suggest that I have the ultimate answer to the buffering
problem; I don't (and I have reservations about the proposals that have been
made).  However, if and when a suitable answer becomes available, I hope it's
part and parcel of the standard I/O package.

Mike Lutz
{allegra,seismo}!rochester!ritcv!mjl

trt@rti-sel.UUCP (08/05/83)

The stdout buffering discussion has occurred at least twice before on Usenet.
Some blame stdio for being insufficiently clever,
some blame programmers for not defending against stdio's cleverness,
I blame Dennis Ritchie for not taking the buffering problem seriously.
(He once said he never wanted people to use the 'setbuf' hack,
that buffering in stdio should be transparent.  Alas, it is not.)

HOW TO LIVE WITH STDIO BUFFERING
Programming 'correctly' in the presence of stdio buffering
is a bit hard to define, but you probably need to put an fflush(stdout)
before each of the following system calls:
	read, pause (also 'sleep'), fork, exec[lv], kill, wait, ioctl ('stty')
Oddly, exit(II) is OK since it "knows" about stdio.  Oh yes, do not use lseek.
If you are debugging, put an fflush in just before it core dumps.
Oh, and do an fflush before a long running computation.
Do you need to fflush before a 'system'?  It gets awful, doesn't it.

I have found that portability and efficiency are well served
by turning on stdio buffering at the very beginning of main():
	setbuf(stdout, BUFFERED_STDIO);
That means I have to put fflush()es all over the place
so the program works, but that is OK because if I do not
the program will not work in the presence of clever stdio systems, anyway.

For portability, the following are handy (but not perfect):
	#ifndef	BUFFERED_STDIO
	#define	BUFFERED_STDIO	(char *)malloc((unsigned)BUFSIZ)
	#endif
	#ifndef	UNBUFFERED_STDIO
	#define	UNBUFFERED_STDIO (char *)NULL
	#endif
Tom Truscott
P.S.  I have an old article on what is wrong with stdio buffering
and how it was fixed at Duke.  Anyone want it re-submitted?

noel@cubsvax.UUCP (08/05/83)

How about a load-time option which would make stdio buffer line-by line?
One could "cc dumb.c -lflush" and /lib/libflush.a would be a version of
some subset of stdio which did whatever one considers best -- I vote
for either line-by-line or per-printf flushes on ALL output files.
This would obviate the need for inserting fflush()es to extract the
output of a program which bombs due to illegal behaviour like "bus error"
"segmentation violation", funny asynchronous behaviour, etc., etc..

Of course if you really felt you needed to optimize for file writes
and only flush for pipes and ttys, you could have yet another version
of the load library which did that.  Of course the different versions
of the library would be #ifdef compilations.

-- 
	Noel Kropf		harpo!rocky2!cubsvax!noel
	1002 Fairchild		philabs!cmcl2!rocky2!cubsvax!noel
	Columbia University
	New York NY 10027	212-280-5517

gwyn@brl-vld@sri-unix.UUCP (08/13/83)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

Your approach certainly seems to assume the least about the details of
any particular STDIO package's buffering algorithm, although it entails
a lot of effort on the part of the programmer (which I find acceptable,
but others haven't).

HOWEVER, some brain-damaged STDIO packages will not let you buffer more
than a line at a time to terminals (and maybe pipes) even via setbuf().
Your suggestions still work in such a case, and that is the best one
could hope for given that STDIO implementation.  It is a pity, though,
that one can't buffer up a screenload of output to a terminal when he is
already taking such pains to setbuf() and fflush() appropriately.

It might be useful for you to re-post your memo on the subject, for
those who haven't been receiving this list very long.