[comp.lang.c] problem with fread/fwrite

siva@bally.Bally.COM (Siva Chelliah) (11/07/90)

I have an interesting (?) question.  

Program 1 : 

#include "stdio.h"
char buf[7];
main ()
{
  int i=1;
  FILE *fp;
  fp = fopen("temp.dat","ab");
  for (i = 0;i<5;i++){
     sprintf(buf,"line %d",i);
     fwrite(buf,sizeof(buf),1,fp);
  }
  fclose(fp);
}
   
temp.dat : 

line 0line 1line 2line 3line 4


fread should update the  pointer, so that I should be able to do a read or 
a write after that. Right ?

Program 2 :

#include "stdio.h"
char buf[7] = "line 4";
char tbuf[7];
main ()
{
  int i=1;
  FILE *fp;
  fp = fopen("temp.dat","r+b");
  fread(tbuf,sizeof(tbuf),1,fp);
  printf("tbuf = %s\n",tbuf);     /* this worked . I got line 0 */
  fwrite(buf,sizeof(buf),1,fp);
  fclose(fp);
}

temp.dat : 

line 0line 1line 2line 3line 4line 0line 4

Can you believe this?  This happened when I used IBM RT, AIX 2.0
When I used Microsoft C 5.1(DOS 3.3 ) , nothing changed in temp.dat .
When I used fseek before fwrite , it worked.  I do not remember reading 
anywhere that I should do a fseek before fread/ fwrite.  Is that a bug in the 
compiler or in my head ?  Please help.

Siva

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/07/90)

In article <402@bally.Bally.COM>, siva@bally.Bally.COM (Siva Chelliah) writes:
> I do not remember reading 
> anywhere that I should do a fseek before fread/ fwrite.  Is that a bug in the 
> compiler or in my head ?  Please help.

It's in your head.  Basically the model is that a stream can be in one
of three states:
	undetermined
	reading
	writing
When you read from a stream, it should not be in 'writing' state.
When you write to a stream, it should not be in 'reading' state.
fseek() and rewind() put a stream back into 'undetermined' state.
This has been the case at least since V7 stdio, maybe longer.
Check your documentation again; it may be under "fopen" where the
meaning of r+ is explained.
-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

chris@mimsy.umd.edu (Chris Torek) (11/07/90)

In article <402@bally.Bally.COM> siva@bally.Bally.COM (Siva Chelliah) writes:
>When I used fseek before fwrite , it worked.  I do not remember reading 
>anywhere that I should do a fseek before fread/ fwrite.  Is that a bug in the 
>compiler or in my head ?

Put that way, the answer has to be `in your head'. :-)  ANSI standard
X3.159-1989 says that you (the programmer) must call fseek or rewind or
fsetpos before switching from reading to writing or vice versa.  (In
fact, the wording refers to `a successful seek operation', suggesting
that not only must you call fseek or fsetpos or rewind, but also that
if the seek fails, you may not change I/O direction.)

Incidentally:

>char tbuf[7];
>  fread(tbuf,sizeof(tbuf),1,fp);
>  printf("tbuf = %s\n",tbuf);

This is a bug waiting to happen.  (In this test program, of course,
the fread returns 1, having read 7 bytes, of which the last is a '\0'
character, so it is okay here, sort of.)  It is dangerous to print a
`string' that has been read in via fread or read, since neither is
guaranteed to store a '\0' at the end.  (It is also dangerous to ignore
return values, but. . . .)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.brl.mil (Doug Gwyn) (11/07/90)

In article <402@bally.Bally.COM> siva@bally.Bally.COM (Siva Chelliah) writes:
-  fp = fopen("temp.dat","r+b");
-  fread(tbuf,sizeof(tbuf),1,fp);
-  fwrite(buf,sizeof(buf),1,fp);
-line 0line 1line 2line 3line 4line 0line 4
-Can you believe this?

Sure.

-When I used fseek before fwrite, it worked.  I do not remember reading 
-anywhere that I should do a fseek before fread/fwrite.

The exact requirement is spelled out quite explicitly in the C standard,
section 4.9.5.3.  I won't bore you with the technical reasons, but this
was not an oversight nor necessarily sloppiness on the part of your C vendor.

hagins@dg-rtp.dg.com (Jody Hagins) (11/08/90)

In article <402@bally.Bally.COM>, siva@bally.Bally.COM (Siva Chelliah) writes:
|> 
|> I have an interesting (?) question.  

[ program 1 deleted ]

|> fread should update the  pointer, so that I should be able to do a read or 
|> a write after that. Right ?

Yes.

|> Program 2 :
|> 
|> #include "stdio.h"
|> char buf[7] = "line 4";
|> char tbuf[7];
|> main ()
|> {
|>   int i=1;
|>   FILE *fp;
|>   fp = fopen("temp.dat","r+b");
|>   fread(tbuf,sizeof(tbuf),1,fp);
|>   printf("tbuf = %s\n",tbuf);     /* this worked . I got line 0 */
|>   fwrite(buf,sizeof(buf),1,fp);
|>   fclose(fp);
|> }
|> 
|> temp.dat : |> 
|> line 0line 1line 2line 3line 4line 0line 4
|> 
|> Can you believe this?  This happened when I used IBM RT, AIX 2.0
|> When I used Microsoft C 5.1(DOS 3.3 ) , nothing changed in temp.dat .
|> When I used fseek before fwrite , it worked.  I do not remember reading 
|> anywhere that I should do a fseek before fread/ fwrite.  Is that a
bug in the 
|> compiler or in my head ?  Please help.

Would you believe, in your head?
The following is a quote from "C A Reference Manual" by 
Harbison and Steele pertaining to fopen().

"When a file is opened for update ('+' is present in the type
string), the resulting stream may be used for both input and
output.  However, an output operation may not be followed by
an input operation without an intervening call to fseek() or
rewind(), and an input operation may not be followed by an
output operation without an intervening call to fseek() or
rewind() or an input operation that encounters end-of-file"

Hope this helps!

|> 
|> Siva


	-Jody
	hagins@gamecock.rtp.dg.com

kpv@ulysses.att.com (Phong Vo[drew]) (11/08/90)

In article <14384@smoke.brl.mil>, gwyn@smoke.brl.mil (Doug Gwyn) writes:
- In article <402@bally.Bally.COM> siva@bally.Bally.COM (Siva Chelliah) writes:
- -  fp = fopen("temp.dat","r+b");
- -  fread(tbuf,sizeof(tbuf),1,fp);
- -  fwrite(buf,sizeof(buf),1,fp);
- -line 0line 1line 2line 3line 4line 0line 4
- -Can you believe this?
- 
- Sure.
- 
- -When I used fseek before fwrite, it worked.  I do not remember reading 
- -anywhere that I should do a fseek before fread/fwrite.
- 
- The exact requirement is spelled out quite explicitly in the C standard,
- section 4.9.5.3.  I won't bore you with the technical reasons, but this
- was not an oversight nor necessarily sloppiness on the part of your C vendor.

However, one may argue that the sloppiness is in the C standard.
The standard, in this case, basically just documents the behavior of stdio
without considering that this is a bad design that arose from
a bad implementation. It is ugly to have to call fseek before switching modes.
There are other uglinesses (e.g., inconsistent interfaces) in stdio that
could have been avoided too. One may say that the standard failed in that
respect. This is sad considering that the standard did go a long way to
invent a new C language.

	Phong Vo

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (11/12/90)

In <13992@ulysses.att.com> kpv@ulysses.att.com (Phong Vo[drew]) writes:

   The standard, in this case, basically just documents the behavior of
   stdio without considering that this is a bad design that arose from
   a bad implementation. It is ugly to have to call fseek before
   switching modes.

I believe the requirement to call fseek (etc.) when switching arises
out of the need to make stdio fast.  Due to buffering, alternating
reads and writes can confuse each other.  The only way the stdio
library could automatically protect you against this would be for it to
explicitly test for internal state before every read and write.  E.g.,
within fread, we sould have:

     if (my_state == DOING_WRITE) {
        .. resync buffer ..
        my_state = DOING_READ;
        .. rest of fread ..
     }

I suppose we should consider ourselves lucky we are even allowed to do
both reads and writes on the same data stream:

     I had the blues
     because I had no shoes
     Until upon the street
     I met a man whose feet
     were stuck in Pascal.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

cechew@bruce.cs.monash.OZ.AU (Earl Chew) (11/13/90)

In <2677@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:

>In <13992@ulysses.att.com> kpv@ulysses.att.com (Phong Vo[drew]) writes:

>   The standard, in this case, basically just documents the behavior of
>   stdio without considering that this is a bad design that arose from
>   a bad implementation. It is ugly to have to call fseek before
>   switching modes.

I think that this is true.

>I believe the requirement to call fseek (etc.) when switching arises
>out of the need to make stdio fast.  Due to buffering, alternating

This is not the case. The main obstacle to switching between reads and writes
is:

1. the behaviour of early implementations of stdio
2. subsequent casting of (1) in concrete by ANSI-C

There is a need to make stdio fast --- but this does not prohibit arbitrary
switching between read and write modes.

Most implementations of stdio buffer data between calls to read(2) and
write(2). Thus the cost of making a system call is only incurred every BUFSIZ
bytes. Intermediate data is transferred directly to the buffer.

The main impediment to switching modes in many implementations of stdio is the
use of a single buffer pointer (usually _ptr). This single pointer functions as
a read pointer when reading and a write pointer when writing, allowing quick
access to the buffer. Calls to a buffer fill or flush function are only made
when the pointer reaches some high water mark. Thus (getc(fp); putc(0, fp)) or
(putc(0, fp); getc(fp)), especially when the pointer is in the middle of the
buffer.

It is possible to perform an automatic mode switch if *two* pointers are used:
a reading pointer and a writing pointer.

>reads and writes can confuse each other.  The only way the stdio
>library could automatically protect you against this would be for it to
>explicitly test for internal state before every read and write.  E.g.,
>within fread, we sould have:

>     if (my_state == DOING_WRITE) {
>        .. resync buffer ..
>        my_state = DOING_READ;
>        .. rest of fread ..
>     }

Some implementations of stdio do this anyway to prevent users from hanging
themselves:

	if (my_state == DOING_WRITE) {
	  ... error ...
	}
	... rest of fread ...

In these cases, there already is a guard on the fread() code, so replacing
`... error ...' with `... resync buffer ...' is possible without loss in
performance for the normal case.

I am unsure whether ANSI-C prohibits stdio implementations from automatic
switching, but it clear that if such a feature were to be implemented, its use
would make the application non-conforming.

In any event, use of the separate read and write pointers allows runtime
checking to ensure that an explicit switch is made between read and write
modes, even if automatic switching is not implemented (ie it is possible to
trap {getc(fp); putc(0, fp);} or {putc(0, fp); getc(fp);}).

Earl
-- 
Earl Chew, Dept of Computer Science, Monash University, Australia 3168
EMAIL: cechew@bruce.cs.monash.edu.au PHONE: 03 5655447 FAX: 03 5655146
----------------------------------------------------------------------

chris@mimsy.umd.edu (Chris Torek) (11/13/90)

In article <2677@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com
(Rahul Dhesi) writes:
>I believe the requirement to call fseek (etc.) when switching arises
>out of the need to make stdio fast.  Due to buffering, alternating
>reads and writes can confuse each other.  The only way the stdio
>library could automatically protect you against this would be for it to
>explicitly test for internal state before every read and write.

Although this is (effectively) the reason the V7 Unix stdio and all its
descendents (and, presumably, whatever predecessor eventually became
the USG stdio and thence the System V stdio, though I have not looked
closer than determining that the SVR3 stdio was absolutely horrid
inside) ... where was I?  Oh yes, the reason most Unix stdios do not
check.  Right.

Your average out-of-the-box Unix stdio has, for efficiency, two
particular state variables in each FILE.  One is a pointer into a
current buffer, and the other is a count.  For `getc' operations, if
the count is positive, one decrements it and fetches through the
pointer, which is then increemented.  For `putc' operations, if the
count is positive, one decrements it and stores through the pointer,
which is then incremented.  This means that buffered I/O, which
typically stores somewhere between 512 and 65536 characters in each
buffer, can handle somewhere between 511 and 65535 `calls' to `getc' or
`putc' within an inline macro expansion.  Unfortunately, it also means
that

	fp = fopen("foo", "w+");
	...
	putc(' ', fp);
	c = getc(fp);

tends to `get' a random value (whatever happened to be in the current
buffer).

This particular `feature' is easy to fix without sacrificing
efficiency.  Instead of carrying one count and one pointer, stdio can
carry *two* counts (and, as it turns out, one pointer).  The current
read or write state is then stored implicitly in the two counts (as
well as explicitly elsewhere, of course).  The following extracts from
my <stdio.h> should give you the idea.

/*
 * Stdio buffers.
 */
struct __sbuf {
	unsigned char *_base;
	int	_size;
};

/*
 * Stdio state variables.
 *
 * The following always hold:
 *
 *	if (_flags&(__SLBF|__SWR)) == (__SLBF|__SWR),
 *		_lbfsize is -_bf._size, else _lbfsize is 0
 *	if _flags&__SRD, _w is 0
 *	if _flags&__SWR, _r is 0
 *
 * This ensures that the getc and putc macros (or inline functions) never
 * try to write or read from a file that is in `read' or `write' mode.
 * (Moreover, they can, and do, automatically switch from read mode to
 * write mode, and back, on "r+" and "w+" files.)
 *
 * _lbfsize is used only to make the inline line-buffered output stream
 * code as compact as possible.
 *
 * _ub, _up, and _ur are used when ungetc() pushes back more characters
 * than fit in the current _bf, or when ungetc() pushes back a character
 * that does not match the previous one in _bf.  When this happens,
 * _ub._base becomes non-nil (i.e., a stream has ungetc() data iff
 * _ub._base!=NULL) and _up and _ur save the current values of _p and _r.
 */
typedef	struct __sFILE {
	unsigned char *_p;	/* current position in (some) buffer */
	int	_r;		/* read space left for getc() */
	int	_w;		/* write space left for putc() */
	short	_flags;		/* flags, below; this FILE is free if 0 */
	short	_file;		/* fileno, if Unix descriptor, else -1 */
	struct	__sbuf _bf;	/* the buffer (at least 1 byte, if !NULL) */
	int	_lbfsize;	/* 0 or -_bf._size, for inline putc */

	/* operations */
	void	*_cookie;	/* cookie passed to io functions */
#if __STDC__ || c_plusplus
	int	(*_read)(void *_cookie, char *_buf, int _n);
	int	(*_write)(void *_cookie, const char *_buf, int _n);
	fpos_t	(*_seek)(void *_cookie, fpos_t _offset, int _whence);
	int	(*_close)(void *_cookie);
#else
	int	(*_read)();
	int	(*_write)();
	fpos_t	(*_seek)();
	int	(*_close)();
#endif

	/* separate buffer for long sequences of ungetc() */
	struct	__sbuf _ub;	/* ungetc buffer */
	unsigned char *_up;	/* saved _p when _p is doing ungetc data */
	int	_ur;		/* saved _r when _r is counting ungetc data */

	/* tricks to meet minimum requirements even when malloc() fails */
	unsigned char _ubuf[3];	/* guarantee an ungetc() buffer */
	unsigned char _nbuf[1];	/* guarantee a getc() buffer */

	/* separate buffer for fgetline() when line crosses buffer boundary */
	struct	__sbuf _lb;	/* buffer for fgetline() */

	/* Unix stdio files get aligned to block boundaries on fseek() */
	int	_blksize;	/* stat.st_blksize (may be != _bf._size) */
	int	_offset;	/* current lseek offset */
} FILE;

extern FILE __sF[];

#define	__SLBF	0x0001		/* line buffered */
#define	__SNBF	0x0002		/* unbuffered */
#define	__SRD	0x0004		/* OK to read */
#define	__SWR	0x0008		/* OK to write */
	/* RD and WR are never simultaneously asserted */
#define	__SRW	0x0010		/* open for reading & writing */
#define	__SEOF	0x0020		/* found EOF */
#define	__SERR	0x0040		/* found error */
#define	__SMBF	0x0080		/* _buf is from malloc */
#define	__SAPP	0x0100		/* fdopen()ed in append mode */
#define	__SSTR	0x0200		/* this is an sprintf/snprintf string */
#define	__SOPT	0x0400		/* do fseek() optimisation */
#define	__SNPT	0x0800		/* do not do fseek() optimisation */
#define	__SOFF	0x1000		/* set iff _offset is in fact correct */
#define	__SMOD	0x2000		/* true => fgetline modified _p text */

	[much deleted]

/*
 * The __sfoo macros are here so that we can 
 * define function versions in the C library.
 */
#define	__sgetc(p) (--(p)->_r < 0 ? __srget(p) : (int)(*(p)->_p++))
#ifdef __GNUC__
static __inline int __sputc(int _c, FILE *_p) {
	if (--_p->_w >= 0 || (_p->_w >= _p->_lbfsize && (char)_c != '\n'))
		return (*_p->_p++ = _c);
	else
		return (__swbuf(_c, _p));
}
#else
/*
 * This has been tuned to generate reasonable code on the vax using pcc
 */
#define	__sputc(c, p) \
	(--(p)->_w < 0 ? \
		(p)->_w >= (p)->_lbfsize ? \
			(*(p)->_p = (c)), *(p)->_p != '\n' ? \
				(int)*(p)->_p++ : \
				__swbuf('\n', p) : \
			__swbuf((int)(c), p) : \
		(*(p)->_p = (c), (int)*(p)->_p++))
#endif
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

karl@ima.isc.com (Karl Heuer) (11/14/90)

In article <3337@bruce.cs.monash.OZ.AU> cechew@bruce.cs.monash.OZ.AU (Earl Chew) writes:
>The main obstacle to switching between reads and writes is:
>1. the behaviour of early implementations of stdio
>2. subsequent casting of (1) in concrete by ANSI-C

X3J11 did not freeze this behavior.  They declined to correct it (and quite
properly so, if there was no existing practice), but the fix is a valid
conforming extension.  It would even be possible for some other standard, like
POSIX, to require it.

>I am unsure whether ANSI-C prohibits stdio implementations from automatic
>switching, but it clear that if such a feature were to be implemented, its
>use would make the application non-conforming.

True (as does, say, the use of "isatty()").  But if the vendors add it now, it
might be required behavior by the time C-2001 is done.

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint

jon@jonlab.UUCP (Jon H. LaBadie) (11/16/90)

Despite all the discussion on this topic, I do not see the need for
the programmer to indicate a switch from reading to writing and visa
versa.  I mean I know it is needed, but I do not understand why.

If in the stdio buffer I have the following;

	Mary had a big sheep.  Supercalifragalisticexpalidocious ...
	^

With my pointer (read in this case) on the 'M', after I fread 23 bytes,
so my buffer and pointer are such:

	Mary had a big sheep.  Supercalifragalisticexpalidocious ...
	                       ^

what is wrong with fwrite'ing "Jack and Jill" on top of Super...?
I.e. what is critical returning to some ground zero state before
making a transition?

Jon

-- 
Jon LaBadie
{att, princeton, bcr, attmail!auxnj}!jonlab!jon

les@chinet.chi.il.us (Leslie Mikesell) (11/16/90)

In article <27633@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:

>[...] though I have not looked
>closer than determining that the SVR3 stdio was absolutely horrid
>inside)

It seems real strange to me that when you use setvbuf, the first
putc() will trigger a write() (on AT&T 3b2 & 386 SysVr3.2 anyway).
This kind of defeats the purpose of requesting the buffer, doesn't
it?

Anyway, the nicest thing about stdio is that you are not obligated
to use it.  The only thing difficult at all to do using your own
buffering is an equivalent to fprintf().  Has anyone built something
like sprintf that can be limited to a fixed buffer size and maintains
state so you can pick up where you quit on the last pass?  It might
return either the number of characters placed in the buffer (if they
all fit) or a negative number indicating the buffer was filled and you
need to call again to get the rest. 

Les Mikesell
  les@chinet.chi.il.us

cechew@bruce.cs.monash.OZ.AU (Earl Chew) (11/17/90)

In <880@jonlab.UUCP> jon@jonlab.UUCP (Jon H. LaBadie) writes:

>If in the stdio buffer I have the following;

>	Mary had a big sheep.  Supercalifragalisticexpalidocious ...
>	^

>With my pointer (read in this case) on the 'M', after I fread 23 bytes,
>so my buffer and pointer are such:

>	Mary had a big sheep.  Supercalifragalisticexpalidocious ...
>	                       ^

>what is wrong with fwrite'ing "Jack and Jill" on top of Super...?
>I.e. what is critical returning to some ground zero state before
>making a transition?

With `traditional' stdio implementations, the FILE will still be in `READING'
mode --- despite the fact that the fwrite() (or putc, etc) may `apparently'
succeed. However, when the buffer is exhausted, you will find that no write(2)
is performed (ie the fact that the buffer is dirty is not recorded) because of
the `READING' mode, and you will lose the data you wrote.

Earl
-- 
Earl Chew, Dept of Computer Science, Monash University, Australia 3168
EMAIL: cechew@bruce.cs.monash.edu.au PHONE: 03 5655447 FAX: 03 5655146
----------------------------------------------------------------------