[net.lang.c] ANSI draft - seeking to eof

matt@prism.UUCP (11/27/85)

The latest (November 11) draft from X3J11 says this about fseek():

	int fseek (FILE *stream, long offset, int ptrname)
	
	A binary stream need not meaningfully support fseek
	calls with a ptrname value of SEEK_END.

	For a text stream, either offset must be zero, or
	offset must be a value returned by an earlier call
	to ftell on the same stream and ptrname must be 
	SEEK_SET.			
				[X3J11/85-138, page 109]

Does this really mean that there's no guaranteed way to seek to the end of
either a text or a binary stream in ANSI C (when such a beast exists)?

------------------------------------------------------------------------------
 Matt Landau      	 {cca, ihnp4!inmet, mit-eddie, wjh12} !mirror!matt
 Mirror Systems, Inc.	 2076 Massachusetts Avenue
			 Cambridge, MA   02140

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/30/85)

> The latest (November 11) draft from X3J11 says this about fseek():
> 
> 	int fseek (FILE *stream, long offset, int ptrname)
> 	
> 	A binary stream need not meaningfully support fseek
> 	calls with a ptrname value of SEEK_END.
> 
> 	For a text stream, either offset must be zero, or
> 	offset must be a value returned by an earlier call
> 	to ftell on the same stream and ptrname must be 
> 	SEEK_SET.			
> 				[X3J11/85-138, page 109]
> 
> Does this really mean that there's no guaranteed way to seek to the end of
> either a text or a binary stream in ANSI C (when such a beast exists)?

It says you can seek to the end of a text stream but not a binary
stream.  This is interesting; one wonders what systems they had
in mind that can do the one but not the other.

By the way, UNIX I/O is meant to qualify as "text stream".  There
was a problem in the wording that we turned up in P1003 (apparently
would require all UNIX files to end with a newline byte); I don't
know whether X3J11 has fixed this by now, but I sure hope so.

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/30/85)

In article <189@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
>> The latest (November 11) draft from X3J11 says this about fseek():
>> 	A binary stream need not meaningfully support fseek
>> 	calls with a ptrname value of SEEK_END.
>> 	For a text stream, either offset must be zero, or
>> 	                            ... ptrname must be 
>> 	SEEK_SET.			
>> 				[X3J11/85-138, page 109]
>It says you can seek to the end of a text stream but not a binary
>stream.  This is interesting; one wonders what systems they had
>in mind that can do the one but not the other.

OK, offset may be zero and ptrname may be SEEK_END for text files.

It seems to me that the problem with binary files on toy "operating
systems" like CP/M and MS/DOS (;-)flame repellent;-)) is not so
much that they can't seek to EOF as that EOF is indeterminate.
I.e., do you want to seek to the end of the last block, or to some
byte that might possibly mark EOF, or to the end of the data in
the file (indeterminate!!!), or what?  I've noticed this problem
when using different utilities on the same file (or the same utility
on different files) on my "toy" (;-)) machines.  By analogy, you
might think of tar files, if you've ever od'ed any, which contain
a multitude of garbage in the last part of the last block which is
only ignored due to the fact that the size is in the header -- which
is not necessarily the case in toy OS's.
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

henry@utzoo.UUCP (Henry Spencer) (12/01/85)

> Does this really mean that there's no guaranteed way to seek to the end of
> either a text or a binary stream in ANSI C (when such a beast exists)?

Only the kludgey one:  read the whole file, and do an ftell() when you hit
the end; then you can seek back there.  Maybe.  Remember that many operating
systems find it very difficult to do a seek in the Unix sense:  their files
simply do not support such an operation in any straightforward way.  Most
programs that want to do non-trivial file manipulation just cannot be
written in such a way as to port to such machines gracefully.  There is no
good way around it.  The minimal X3J11 semantics for seek reflect this.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

golde@uw-beaver (Helmut Golde) (12/03/85)

One problem that I have noticed on non-UNIX machines (MS-DOS) is that of the
newline character.  Unlike UNIX, MS-DOS uses two characters CR and LF to
mark line boundries.  One common way of dealing with this is translating
CR-LF pairs into simple LF when reading and doing the reverse when writing.
One big drawback is that seeking no longer works as you expect -- seeking
over a line bound seeks two characters forward, instead of one in UNIX.  

Does the ANSI draft have any better solution to this problem, or is this
kludge the accepted way of dealing with the problem

jimc@ucla-cs.UUCP (12/09/85)

In article <189@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
>> The latest (November 11) draft from X3J11 says this about fseek():
>> 	int fseek (FILE *stream, long offset, int ptrname)
>> 	A binary stream need not meaningfully support fseek
>> 	calls with a ptrname value of SEEK_END.
>> 				[X3J11/85-138, page 109]
>
>It says you can seek to the end of a text stream but not a binary
>stream.  This is interesting; one wonders what systems they had
>in mind that can do the one but not the other.
>
For one, CP/M records the ending block of a file but not the byte within
block.  If you want to know where the end is you (the library routine) have
to insert some code, like ^Z for text, and look for it on reading, or search
the last block for it on fseek(,,SEEK_END).  Obviously this doesn't work in
binary files. Often the application program has records of the form: byte
count, record type, binary data; and one of the types means EOF.

James F. Carter            (213) 206-1306
UCLA-SEASnet; 2567 Boelter Hall; 405 Hilgard Ave.; Los Angeles, CA 90024
UUCP:...!{ihnp4,ucbvax,{hao!cepu}}!ucla-cs!jimc  ARPA:jimc@locus.UCLA.EDU