[net.unix-wizards] Subtle bug in fflush and fix

peterson@milano.UUCP (07/08/86)

The stdio package does not seem to properly account for some effects
of signals.  In particular, I have had instances of the following:

1. Output by one process into a pipe or socket, using the stdio
package (specifically putc), buffering output and eventually calling
_flsbuf to actually write the buffer to the pipe or socket.

2. Over time, the writing process gets ahead of the reading process,
causing the internal system block buffers for this pipe/socket to
fill.  Eventually the "write" system call in _flsbuf or fflush
blocks, waiting for the buffered data to be read so that more
can be written.

3.  The reading process was busy because of some interesting other
business and sends a signal to the writing process to let it know
of the interesting events it has had, and then goes back to reading.

4. The signal breaks the write system call (in _flsbuf or fflush)
and returns only a partial write ( 0 < write(...,n) < n) Presumably,
errno is EINTR.  The stdio routines interpret this condition as an
error and return EOF.  The buffer is set empty however, so that the
application program has no way of knowing how much was (and was not)
written.

Correcting this problem requires checking each write to see if some
bytes have been written, and as long as bytes are being written,
continuing to write more.  Eventually, either all bytes will be
written or an error will occur (write(...) <= 0).

The affected code is in flsbuf.c.  A revised version follows:

/* @(#)flsbuf.c	4.6 (Berkeley) 6/30/83 */

/* ************************************************************** */

/*
	There are four routines in this file:

	_flsbuf(c, FILE *)	an attempt was made to put the
				char c into an apparently full buffer

	_cleanup()		flush all buffers on exit

	fclose(FILE *)		flush and close the file

	fflush(FILE *)		flush the buffer
*/

/*
	In general, each FILE * has an I/O buffer allocated for it.
	Output charcters are put into the buffer until it is full.
	The buffer is flushed when 

	it is full and another character is to be added to 
		the buffer (_flsbuf -- called by putc),
	the program terminates (_cleanup), 
	the file is closed (fclose), or 
	it is explicitly emptied (fflush).

	The FILE * structure uses the following variables to keep
	track of the state of the buffer:

	_base		the base address of the buffer; if it is
			NULL, then no space has yet been allocated.

	_bufsiz		the amount of space in the buffer

	_ptr		pointer to the next empty space in the buffer

	_cnt		the number of empty spaces still in the buffer
			normally we would expect _cnt to be 
				_bufsiz - (_ptr - _base)
			but if the file is unbuffered or line buffered,
			_cnt is always set to zero.

	_flag		various flags: see the list in stdio.h
*/

/* ************************************************************** */

#include	<stdio.h>
#include	<sys/types.h>
#include	<sys/stat.h>

char   *malloc ();

_flsbuf(c, iop)
register    FILE * iop;
{

    register int    returncode = c;

    /* I/O to stdout should go through the stdout buffer */
    extern char _sobuf[];


 /* 
  check if the file can be either read or write (IORW) and,
  if so, change it to be in a write state (IOWRT). Also clear
  the read state (IOREAD) and any EOF flag (IOEOF)
  */
    if (iop->_flag & _IORW)
	{
	    iop->_flag |= _IOWRT;
	    iop->_flag &= ~(_IOEOF | _IOREAD);
	}

    /* if the file is not in a write state (IOWRT), exit */
    if ((iop->_flag & _IOWRT) == 0)
	return(EOF);

tryagain: 

    /*  check for line buffering -- flush on buffer full or newline */
    if (iop->_flag & _IOLBF)
	{
	    *iop->_ptr++ = c;
	    if (iop->_ptr >= iop->_base + iop->_bufsiz || c == '\n')
		{
		    if (WriteBuffer(iop) == EOF)
			return(EOF);
		}

	/* setting cnt to zero falsely indicates that there is no room
	   left in the buffer -- forces putc to call flsbuf on every
	   character */
	    iop->_cnt = 0;

	    return(c);
	}

    /*  check for no buffering -- flush immediately */
    if (iop->_flag & _IONBF)
	{
	/* if no buffering is wanted, output the character immediately 
	*/
	    char    c1;
	    int     n;

	    c1 = c;
	    n = write(fileno(iop), &c1, 1);
	    iop->_cnt = 0;

	/* since only one character is being written, it is either
	   written, or we have an error */
	    if (n == 0)
		return(EOF);
	    return(c);
	}

 /* 
  normal case: block buffering.  
  In general, we want to empty the buffer.
  The other possibility is that this is the very first
  call, and we need to allocate space for this file
  */

    if (iop->_base != NULL)
	{
	/* normal case -- empty the buffer */
	/* compute the number of bytes to be written */
	    if (iop->_ptr > iop->_base)
		{
		    returncode = WriteBuffer(iop);
		}
	}
    else
	{
	    /* no space has been allocated for the buffer for
		this file */

	    int     size;

	/* 
	 The _base field points at the buffer space; if it
	 is NULL, we need to allocate space.
	 
	 We want to have our buffer size match the size of
	 the blocks used by the operating system for this
	 file, if possible.  Check if the file exists and has
	 a block size. If not, use a default block size
	 */
	    {
		struct stat stbuf;
		if (fstat(fileno(iop), &stbuf) < 0 ||
			stbuf.st_blksize <= NULL)
		    size = BUFSIZ;
		else
		    size = stbuf.st_blksize;
	    }

	/* for standard output (stdout) to a tty, we should use line
	   buffering. Set the line buffering flag (LBF) and,
	   set the buffer pointers to _sobuf, and start over.
	*/
	    if (iop == stdout)
		{
		    if (isatty(fileno(stdout)))
			{
			    iop->_flag |= _IOLBF;
			    iop->_base = _sobuf;
			    iop->_ptr = _sobuf;
			    iop->_bufsiz = size;
			    goto tryagain;
			}
		}

	/* now malloc space for the i/o buffer -- if no space
	   available, try as unbuffered */

	    if ((iop->_base = malloc(size)) == NULL)
		{
		    iop->_flag |= _IONBF;
		    goto tryagain;
		}

	/* space is allocated; set MYBUF to indicate this, and set the
	   buffer size (bufsiz).  No write is needed yet -- just store
	   the character in the buffer this time and return */
	    iop->_flag |= _IOMYBUF;
	    iop->_bufsiz = size;
	    iop->_ptr = iop->_base;
	    iop->_cnt = iop->_bufsiz;
	    returncode = 0;
	}

 /* 
  now add the extra character that caused the buffer 
  to be flushed to the previously empty buffer.  
  
  -- _cnt is the number of available spaces 
  -- _ptr points to the next available buffer byte
  */

    *iop->_ptr++ = c;
    iop->_cnt -= 1;

    if (returncode == EOF)
	return(EOF);

 /* if no error, return the input character */
    return(c);
}


/* ************************************************************** */

fflush(iop)
register struct _iobuf *iop;
{

    if ((iop->_flag & (_IONBF | _IOWRT)) == _IOWRT
	    && iop->_base != NULL && iop->_ptr > iop->_base)
	return(WriteBuffer(iop));
    return(0);
}

/* ************************************************************** */

/* write a buffer; return EOF if there was any error */

/*
	This code handles the problem that if the
	write system call is interrupted, for example by a signal,
	the write would return without writing the entire buffer.
	In this case, we retry the write until it either fails
	or completes.
*/

static  WriteBuffer (iop)
register struct _iobuf *iop;
{
    register char  *base;
    register int    nBufferBytes;
    register int    nBytesWritten;

    base = iop->_base;
    nBufferBytes = iop->_ptr - base;


    /* write bytes as long as there are bytes to write, and bytes are
    being written */
    do
	{
	    nBytesWritten = write(fileno(iop), base, nBufferBytes);
	    base += nBytesWritten;
	    nBufferBytes -= nBytesWritten;
    } while (nBufferBytes > 0 && nBytesWritten > 0);

    /* reset buffer variables for an empty buffer */
    iop->_ptr = iop->_base;
    if (iop->_flag & (_IOLBF | _IONBF))
	iop->_cnt = 0;
    else
	iop->_cnt = iop->_bufsiz;

    /* check for possible error in the write */
    if (nBytesWritten == -1)
	{
	    iop->_flag |= _IOERR;
	    return(EOF);
	}
    return(0);
}

/* ************************************************************** */
/*
 * Flush buffers on exit
 */

_cleanup()
{
    register struct _iobuf *iop;
    extern struct _iobuf   *_lastbuf;

    for (iop = _iob; iop < _lastbuf; iop++)
	fclose(iop);
}

/* ************************************************************** */

fclose(iop)
register struct _iobuf *iop;
{
    register int    r;

    r = EOF;
    if (iop->_flag & (_IOREAD | _IOWRT | _IORW)
	    && (iop->_flag & _IOSTRG) == 0)
	{
	    /* flush the buffers and close the file */
	    r = fflush(iop);
	    if (close(fileno(iop)) < 0)
		r = EOF;
	    /* if memory was malloc'ed, free it */
	    if (iop->_flag & _IOMYBUF)
		free(iop->_base);
	}

 /* reinitialize the i/o buffer to its initial state */
    iop->_cnt = 0;
    iop->_base = (char *) NULL;
    iop->_ptr = (char *) NULL;
    iop->_bufsiz = 0;
    iop->_flag = 0;
    iop->_file = 0;

    return(r);
}
-- 
James Peterson
peterson@mcc.arpa  or  ...sally!im4u!milano!peterson

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/10/86)

In article <1781@milano.UUCP> peterson@milano.UUCP writes:
>The stdio package does not seem to properly account for some effects
>of signals.

This is a tremendous understatement.  Use of signals is so
fraught with peril that X3J11 seriously considered guaranteeing
absolutely nothing whatsoever about what a signal handler can
do.  I personally can't even guarantee that a signal handler
function is entered successfully on most machine architectures.

The problems are not as well known as they should be.  Signals
can impact the design of much of the C library.  AT&T reworked
the standard I/O routines in an attempt to permit the use of
printf() in a signal handler, but they didn't get it quite right
because it is NOT POSSIBLE to get it right, at least not without
severe performance impact.  Many experienced UNIX system
programmers avoid using signals for anything other than
	(1) job control, on Berkeley-like systems;
	(2) synchronous notification of state changes, such
		as SIGWINCH;
	(3) a means of requesting termination of a process.
Certainly they should not be used for IPC.

This is not a criticism of Peterson's bug fixes.  That just
prompted me to mention this in case there is anyone who needs
to be warned.

garry@batcomputer.TN.CORNELL.EDU (Garry Wiegand) (07/11/86)

In a recent article gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) wrote:
>...
>The problems are not as well known as they should be.  Signals
>can impact the design of much of the C library.  AT&T reworked
>the standard I/O routines in an attempt to permit the use of
>printf() in a signal handler, but they didn't get it quite right
>because it is NOT POSSIBLE to get it right, at least not without
>severe performance impact...

I can see it would be tough to do at user level. But I don't see why 
the kernel read() and write() routines can't hang a simple queue on
their front ends. Elucidate, please?
-- 
garry wiegand   (garry%cadif-oak@cu-arpa.cs.cornell.edu)

gwyn@brl-smoke.UUCP (07/12/86)

In article <603@batcomputer.TN.CORNELL.EDU> garry%cadif-oak@cu-arpa.cs.cornell.edu writes:
>In a recent article gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) wrote:
>>The problems are not as well known as they should be.  Signals
>>can impact the design of much of the C library.  AT&T reworked
>>the standard I/O routines in an attempt to permit the use of
>>printf() in a signal handler, but they didn't get it quite right
>>because it is NOT POSSIBLE to get it right, at least not without
>>severe performance impact...
>I can see it would be tough to do at user level. But I don't see why 
>the kernel read() and write() routines can't hang a simple queue on
>their front ends. Elucidate, please?

I'm afraid it would be a long and boring story and may violate
our non-disclosure agreement to go into the details, but...

The fundamental source of difficulty is the fact that the
standard I/O library (as well as some other C library
routines) maintain static data structures that can be
left in an inconsistent state when an asynchronous interrupt
occurs.  Using Berkeley-style "reliable signal" facilities,
which seem to have finally made it (in some form) into UNIX
System V, one could establish "conditional critical regions"
that hold signals during times that the data structures are
being rearranged.  However, since that adds system calls
there is a performance penalty.  Without reliable signals,
one would have to resort to semaphore mechanisms, which are
either expensive (System V sem*() facilities) or may require
special hardware support.  (Although sometimes one can exploit
conventional facilities; I once had a semaphore package for
the PDP-11 that used ASRB and INC to obtain atomic interlock.)
The semaphore approach greatly complicates the library code,
since it has to complete the interrupted code's work if a
critical region is in progress when an interrupt occurs.  If
a second interrupt comes along there can be a royal mess.
There are other approaches (e.g., coroutining) that I can
think of but they're even more elaborate than this.

This situation is one of the reasons that ANSI X3J11 had to
decide just how much to promise could be guaranteed for a
signal catching function.  (We didn't promise much; you can
abort() or longjmp() or do some implementation-specified
atomic variable modify or something else I forget.)