[comp.sys.atari.st] read past EOF?

nowlin@ihuxy.UUCP (06/04/87)

I've run into a problem with the Fread() gemdos macro.  Using Megamax I
wrote a piece of code to buffer line-by-line input.  The code reads files
by 8K blocks at a time.  There's no problem as long as the call to Fread()
requests a block smaller than the rest of the file being read.  For
example, if I read 1000 bytes from a 3000 byte file Fread() returns 1000
and there are exactly 1000 characters stored in the buffer specified.  If
there are only 300 bytes in the file and I request 1000 Fread() will return
301 and the 301st byte is a ^Z.  Since ^Z is the EOF character in gemdos It
appears that Fread() is actually reading one character past/into the end of
file and sticking that character at the end of the input buffer.

Please don't reply with work around solutions to this.  I've already worked
around it.  I just wanted to see if anybody else has run into this problem
and if they've found a real solution for it.  In most cases this isn't much
of a problem but it's something to watch out for when comparing files.  It
looks like the files are actually starting to diverge from each other when
in fact one has ended.  These are two entirely different things and this
bug makes the later trickier to detect.

I'm also not sure if this is only related to Megamax or is a bug in the
gemdos call itself.  Megamax seems to think it's gemdos.  I'm not sure
since Megamax has to provide the gemdos() library call that's used to
invoke the lower level routines and that could be where the problem is
occurring.  Has anyone seen this problem with any of the other C
development systems?

Jerry Nowlin
(...!ihnp4!ihuxy!nowlin)

apratt@atari.UUCP (Allan Pratt) (06/05/87)

in article <1987@ihuxy.ATT.COM>, nowlin@ihuxy.ATT.COM (Jerry Nowlin) says:
> If
> there are only 300 bytes in the file and I request 1000 Fread() will return
> 301 and the 301st byte is a ^Z.  Since ^Z is the EOF character in gemdos It
> appears that Fread() is actually reading one character past/into the end of
> file and sticking that character at the end of the input buffer.

^Z doesn't mean EOF to GEMDOS: it means EOF to some braindamaged programs.
Programs like Mince, for instance.

If there is a ^Z at the end of your file, it is considered a character
just like any other character to GEMDOS.  It's up to the library to interpret
^Z as EOF.  There is a remark in the GEMDOS documentation to the effect
that "Some applications use ^Z to mark EOF in text files."  The intent
of this was that you should write your programs to be liberal: when
dealing with a text file, don't be surprised to see ^Z, but don't be
surprised not to.  Personally, I prefer no ^Z, because you already know
exactly how many bytes there are in a file.  But as far as GEMDOS is
concerned, files are untyped: there is no concept of a "text" file
versus a "binary file" -- all files are just a collection of bytes.

Some history:  ^Z came to mean EOF because in CP/M, files were allocated in
clusters of multiples of 128 bytes.  In the directory entry for a file,
there was no indication of *exactly* how many bytes were in the file.
This didn't matter for programs (binary files), but it did matter for
text files.  So ^Z was used as the end-of-text marker in text files.

MuShDOS preserved this braindamage, even though it didn't need it because
it DOES keep track of EXACTLY how many bytes were written to a file.
GEMDOS doesn't even document any special treatment of ^Z, but some
programs use it to mark EOF in text files (e.g. Mince).

To address Mr. Nowlin's remarks specifically: how do you know there are
300 bytes left in the file?  Maybe there were 300 text bytes left and
one EOF byte (^Z).  Here's one way to find out how many bytes are REALLY
left in the file:

long bytes_left_in_file(fd)
int fd;
{
	long pos = Fseek(0L,fd,1);		/* get current offset */
	long end = Fseek(0L,fd,2);		/* seek to end, get offset */

	Fseek(pos,fd,0);			/* seek back to pos */
	return end-pos;
}


/----------------------------------------------\
| Opinions expressed above do not necessarily  |  -- Allan Pratt, Atari Corp.
| reflect those of Atari Corp. or anyone else. |     ...lll-lcc!atari!apratt
\----------------------------------------------/

apratt@atari.UUCP (Allan Pratt) (06/05/87)

in article <749@atari.UUCP>, apratt@atari.UUCP (Allan Pratt) says:
> 
> ^Z doesn't mean EOF to GEMDOS: it means EOF to some braindamaged programs.
> Programs like Mince, for instance.
> 

Okay, sorry, I didn't mean braindamaged programs; just those written
for CP/M and preserving that silly (my opinion) convention through
MS-DOS and GEMDOS.

/----------------------------------------------\
| Opinions expressed above do not necessarily  |  -- Allan Pratt, Atari Corp.
| reflect those of Atari Corp. or anyone else. |     ...lll-lcc!atari!apratt
\----------------------------------------------/

braner@batcomputer.UUCP (06/07/87)

[]

When I buffer I/O myself (see the source code for my uu*code, MORE, etc)
(oops: those are in AL, not C!) (In C: my version of microEMACS) I first
read the file attributes (using Setdta() and Fsfirst()) and isolate the file
size.  Later, when reading the file in chunks (of 4.5 or 9 or 18K) I keep
track of the amount remaining to be read.  In the last call to Fread() I only
ask for the amount that I know is there!  (Why all the trouble? - 'cause this
is _much_ faster than using the byte-by-byte UNIX-like fopen(),fread()...)

- Moshe Braner