[comp.lang.c] Distorting fseek semantics

dhesi@bsu-cs.UUCP (01/01/70)

In article <6423@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) 
writes:
     These attacks on the competence of X3J11 do nothing but show how
     little the attackers understand about the issues involved.

I have no intention of attacking the competence of X3J11 and I apologise
if I gave that impression.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/10/87)

In article <8560@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>Stdio includes fread, fwrite, and fseek.  The X3J11 drafts do
>put some restrictions on portable uses of them, which are inevitable given
>that the full generality of something like Unix seeks is unimplementable
>on some systems.

I realize I'm in the minority, but ANSI did something wrong here.  ANSI
is supposed to be standardizing an existing language.  Even the drastic
new feature of function prototypes is (a) upward compatible with all or
most existing software and (b) widely believed to be badly needed.  No
such justification exists for crippling the beautiful and simple
semantics of fseek that have been in use for many years.

ANSI had a simple choice:  (a) Leave fseek as it is, with the result
that some vendors would not be able to honestly claim conformance with
the ANSI standard until they modified their operating systems to
support a generalized seek;  (b) Change fseek so such vendors would not
have to work so hard.

The portability argument is a red herring.  ANSI is free to add an
appendix that describes a weaker fseek, in which one cannot directly to
go where one has not sequentially gone before, that nonconforming C
implementations can provide.  Software developers who really want to
support all systems, including the ones whose developers refuse to fix
their punched-card-based designs, could restrict themselves to this
weaker specification.  The rest of us would be able to write programs
as we've been writing them for a decade without being accused of not
conforming to ANSI specs.

C compilers for, UNIX, MS-DOS, AmigaDOS, Macintosh, CP/M, Minix, OS/2,
and numerous other systems support a generalized fseek.  Even VAX/VMS,
which is heavily into record-based I/O, supports stream-LF files that
allow the original fseek semantics to be preserved.  There is no
reason, other than the practical realization that it's more profitable
to channel resources into persuading ANSI than into changing the
operating system, that other vendors cannot do the same thing.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

guy@sun.uucp (Guy Harris) (09/11/87)

> I realize I'm in the minority, but ANSI did something wrong here.  ANSI
> is supposed to be standardizing an existing language. ...
> No such justification exists for crippling the beautiful and simple
> semantics of fseek that have been in use for many years.

Make that "have been in use on *some* systems for many years".  I agree, the
UNIX semantics of "fseek" are wonderful and beautiful and all that irrelevant
Mom-and-apple-pie stuff, but they aren't always implementable on non-UNIX
systems.

> ANSI had a simple choice:  (a) Leave fseek as it is,

Here is "fseek" "as it is", from the document "A New Input-Output Package", by
D. M. Ritchie, Bell Laboratories, Murray Hill, New Jersey 07974:

	fseek(ioptr, offset, ptrname) FILE *ioptr; long offset

	The location of the next byte in the stream named by "ioptr" is
	adjusted.  "Offset" is a long integer.  If "ptrname" is 0, the offset
	is measured from the beginning of the file; if "ptrname" is 1, the
	offset is measured from the current read or write pointer; if
	"ptrname" is 2, the offset is measured from the end of the file.  The
	routine accounts properly for any buffering.  (When this routine is
	used on non-Unix systems, the offset must be a value returned from
	"ftell" and the ptrname must be 0).

The only difference between this and what appears in the August 3, 1987 ANSI C
draft is that:

	1) DMR's description didn't mention the possibility of "offset" being
	   0 being used as a portable "rewind" function; perhaps the intent
	   was that "rewind" be used for this, because the cited document does
	   not state that "rewind(f)" is equivalent to "fseek(f, 0L, 0)".

	2) DMR's description doesn't allow for the "offset" being a byte
	   ordinal number on binary files - but his description didn't even
	   *mention* binary files; it didn't describe the "b" flag to "fopen".

So ANSI *did* leave "fseek" as it is *in descriptions of it as a C language
routine*; they didn't "change 'fseek' so (vendors with OSes where it can't act
as a generalized seek) would not have to work so hard".  They didn't give the
description of "fseek" *as a UNIX library routine*, but X3J11 is not a UNIX
interface standard!

Actually, given point 2) there, you could argue that they made it *more* like
the UNIX "fseek" than Dennis' paper did.  You *do* have the ability to deal
with the file as an ordered sequence of bytes; however, to do so you must open
the file as a binary file, which means you won't see UNIX-style lines unless
the native OS implements them.  (For instance, such a file could be treated in
a record-oriented OS as a sequence of 512-byte records.)  As such, you *can*
port programs of the sort you're used to writing on UNIX to those other
systems *as long as you use the "b" option to "fopen" and as long as you're
willing to accept that these files may be in a private format comprehensible
only to other C programs or programs that know about this format*.  You just
can't be guaranteed to do this sort of thing on *text* files.

> The portability argument is a red herring.  ANSI is free to add an
> appendix that describes a weaker fseek, in which one cannot directly to
> go where one has not sequentially gone before, that nonconforming C
> implementations can provide.  Software developers who really want to
> support all systems, including the ones whose developers refuse to fix
> their punched-card-based designs, could restrict themselves to this
> weaker specification.  The rest of us would be able to write programs
> as we've been writing them for a decade without being accused of not
> conforming to ANSI specs.

If this were done, there would be a lot fewer compliant implementations out
there, so people who were interested in writing not just standard-conforming
but code that was *in practice* portable, would conform to the *de facto*
standard formed by replacing the standard's "fseek" by the one described in
this appeendix.  In effect, this would mean that the *de facto* ANSI C
standard, as opposed to the *de jure* ANSI C standard, would not include a
UNIX-flavored "fseek".  What has this bought you?

> C compilers for, UNIX, MS-DOS, AmigaDOS, Macintosh, CP/M, Minix, OS/2,
> and numerous other systems support a generalized fseek.

UNIX and Minix are red herrings here; those systems implement UNIX-compatible
I/O.  If any of those operating systems store lines UNIX-style, with a single
end-of-line character, implementing UNIX-style "fseek" isn't difficult, as the
translation between native and C lines does not change the number of bytes in
a record.  (I infer from the Lightspeed C manual that the Macintosh puts CR
rather than LF at the end of the line, so C implementations on the Mac can
provide UNIX-style "fseek".)

I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really
is.  It would seem that the file position would either have to be the ordinal
number of the current byte in the underlying file - in which case, were you to
use UNIX-style "fseek"s, you could conceivably confuse the heck out of the I/O
library by putting the file pointer on the LF of a CR/LF pair - or would have
to be translated to what the byte offset would have been, had MS-DOS used
UNIX-style line formats - in which case, seeks could end up being quite
expensive or require an auxiliary data structure to do the mapping.  Even if
you have this auxiliary data structure, you would either have to keep it
around in permanent storage for all text files, which seems a bit tacky (and
doesn't solve the problem of text files created before this auxiliary data
structure was introduced) or would have to contruct it as needed, which could
get expensive.

> Even VAX/VMS, which is heavily into record-based I/O, supports stream-LF
> files that allow the original fseek semantics to be preserved.

Which doesn't help you if you feed a non-stream-LF file to a C program as an
input text file; if you can't do that, there is a strong disincentive to write
text-processing applications in C.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

flaps@utcsri.UUCP (09/11/87)

In article <1129@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>The portability argument is a red herring.  ANSI is free to add an
>appendix that describes a weaker fseek, in which one cannot directly
>go where one has not sequentially gone before, that nonconforming C
>implementations can provide.  Software developers who really want to
>support all systems, including the ones whose developers refuse to fix
>their punched-card-based designs, could restrict themselves to this
>weaker specification.  The rest of us would be able to write programs
>as we've been writing them for a decade without being accused of not
>conforming to ANSI specs.

You are missing the point of a standard.  If so many systems will be
supporting only the weaker fseek(), what's the point of having everyone
look at you and nod approvingly that you're following the standard,
when your programs are still not portable?

If many people only support the weaker fseek(), then that's all that's
standardized, despite any official ANSI blessing which you are asking
for.  A blessing gets you nothing.  We're trying to be able to write
portable programs.

ajr, C programmer at large

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/12/87)

In article <27734@sun.uucp> guy@sun.uucp (Guy Harris) writes:
[first major point summarized here in my words]:
     Dennis Ritchie's description of fseek includes an exception for
     non-UNIX systems, and ANSI's description of fseek largely conforms
     to that exception.

I can't argue this on legalistic grounds, but when vendors have
implemented C on non-UNIX systems, they have always** used the UNIX
implementation as a de facto standard.  A vendor whose version of C is
different from that under UNIX faces a competitive pressure to
conform.  When a user wants to know why a C implementation differs from
the UNIX way, it's probably not going to be effective for a vendor to
point out the exception that Ritchie made for non-UNIX systems.

But now, the standard to model implementations after will not be UNIX
but the ANSI standard.  To the extent that the ANSI standard weakens
the power of the C standard library, the user will lose.  For example,
the mail delivery agent smail uses a binary search on a sorted text
file containing mail paths.  Unless I'm missing something, such a
binary search will be impossible in a C implementation that conforms to
the ANSI standard and goes no further.

>If this were done, there would be a lot fewer compliant implementations out
>there, so people who were interested in writing not just standard-conforming
>but code that was *in practice* portable, would conform to the *de facto*
>standard formed by replacing the standard's "fseek" by the one described in
>this appeendix.  In effect, this would mean that the *de facto* ANSI C
>standard, as opposed to the *de jure* ANSI C standard, would not include a
>UNIX-flavored "fseek".  What has this bought you?

Those conforming to the de facto fseek would still continue to try to
make it into the de jure fseek.  It's a competitive advantage for a
vendor to be able to claim full compliance with an ANSI standard.  In
the long run, it would be more likely that most vendors would offer the
UNIX-style fseek.  Users would win.

It's quite possible that, had ANSI C existed some years ago, DEC would
have managed to conform to it without having to introduce stream-LF
files.  Users in general would have been losers.

>I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really
>is.  It would seem that the file position would either have to be the ordinal
>number of the current byte in the underlying file - in which case, were you to
>use UNIX-style "fseek"s, you could conceivably confuse the heck out of the I/O
>library by putting the file pointer on the LF of a CR/LF pair - or would have
>to be translated to what the byte offset would have been, had MS-DOS used
>UNIX-style line formats - in which case, seeks could end up being quite
>expensive or require an auxiliary data structure to do the mapping.  

Confession:  I exaggerated about MSDOS.  On the compilers I've tried
you can fseek, but you get to fseek to the nth byte, where the nth byte
is the same byte that you would get if you opened the file as a binary
file.  I think most implementations of stdio under MSDOS simply ignore
all CR characters on a read, so no confusion will result after a
generalized fseek.

Note that a binary search on a text file will still work, which cannot
be said for ANSI's more restrictive fseek.

>> Even VAX/VMS, which is heavily into record-based I/O, supports stream-LF
>> files that allow the original fseek semantics to be preserved.
>
>Which doesn't help you if you feed a non-stream-LF file to a C program as an
>input text file; if you can't do that, there is a strong disincentive to write
>text-processing applications in C.

Not really.  One can still sequentially read any VMS text file.  The output
from the application can be in stream-LF format.

Because of the competitive pressure to conform to UNIX conventions, DEC
has modified most (perhaps all) its utilities that normally use text
files to also accept stream-LF format.

VMS will even load and execute a file in stream-LF format if it has the
same data bytes as a standard executable 512-byte fixed-length record
executable file.  (I couldn't believe my eyes when I saw this.)   DEC
is getting a little closer to embracing the UNIX model, and is no worse
off for it.

This would likely not have happened if the standard to aspire to had
been ANSI C rather than the de facto standard of the UNIX implementation.

I believe at one time ANSI actually allowed a binary file to return
more characters on a read than had been ever written to it.  That such
bizarre behavior could be even considered, let alone included in the
draft, shows how much pressure there must be on ANSI.

SUMMARY:  The weakened fseek in ANSI C will lead to fewer vendors being
pressured into providing the more flexible UNIX-style fseek, without a
compensating gain in portability.  Users will lose.

---
**The only exception I know of, where a vendor did not use the UNIX
standard as a model, was one that had "putfmt" instead of "printf", and
a lot of other unusual functions.  I think it was from Whitesmiths.  I
believe it has been changed since them.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

nather@ut-sally.UUCP (09/12/87)

In article <27734@sun.uucp>, guy@sun.uucp (Guy Harris) writes:
> 
> I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really
> is.  

It's not so bad.  If the file is opened as a text file, it works just like
Unix, since the CR code is removed on input and restored on output.  If,
however, the file is opened in binary, then you must use ftell() to find
out where you are.


-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
nather@astro.AS.UTEXAS.EDU

gwyn@brl-smoke.UUCP (09/12/87)

In article <1129@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>No such justification exists for crippling the beautiful and simple
>semantics of fseek that have been in use for many years.

Not only does such a justification exist, if you had read the Rationale
document before spouting off, you would know what it is.

	"Whereas a binary file can be treated as an ordered sequence
	of bytes, counting from zero, a text file need not map
	one-to-one to its internal representation (see \(sc4.9.2).
	Thus, only seeks to an earlier reported position are permitted
	for text files."

In other words, text files simply cannot be counted on to support the
UNIX byte-array model, for a variety of technical reasons that were
thoroughly discussed by X3J11 in the process of specifying fseek().

The reason for not requiring SEEK_END be supported for binary streams
is that many systems do not maintain an EOF mark (some use a ^Z byte
followed by all NUL bytes in the last allocated block, some don't even
have that much of a marker).  The Rationale doesn't seem to explain
this particular point; perhaps it should.

Of course, UNIX implementations of fseek() will provide additional
semantics, and POSIX requires this in a couple of ways (identity of
text and binary streams; fseek() inheritance of lseek() semantics).
It is not within X3J11's assigned scope to insist that only UNIX-like
operating systems are valid, even if the committee believed that.

By the way, the idea that UNIX-like semantics be guaranteed and the
C implementation be limited to supporting just one of several
available file types (for example, VAX/VMS text stream type) has been
declared by several vendors to be unacceptable to their customers.
I have no reason to doubt that.

The alternative to the weaker-than-UNIX specification of fseek() would
have been to not require it at all for ANSI C.  I hope it is obvious
that that alternative is considerably less desirable.

gwyn@brl-smoke.UUCP (09/12/87)

In article <1134@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>I believe at one time ANSI actually allowed a binary file to return
>more characters on a read than had been ever written to it.  That such
>bizarre behavior could be even considered, let alone included in the
>draft, shows how much pressure there must be on ANSI.

An implementation-defined number of NUL characters (bytes) is still
allowed to be appended to a binary stream that was earlier written
under the same implementation (not necessarily the same process).
It doesn't matter how "bizarre" you find this, it's a fact that some
operating systems are like that.  It's much better that a programmer
be warned that this can happen than to have him remain ignorant of
such important facts of life.

Nobody in his right mind would attempt to guarantee that all code
already written for UNIX systems will work unchanged on all other
operating systems.  Binary search on text files is a good example
of exploiting non-portable system characteristics.  It is simply
not true that existing code for that would be portable "if only"
ANSI C would insist that text files be treatable as randomly-
addressable byte arrays, except in the trivial sense that there
would then be fewer conforming implementations to port the code to.
Vendors who can will undoubtedly give the interface to text files
as many UNIX-like properties as possible, since that will cause
their customers the least trouble.  Vendors who can't, won't
anyway; better that they have standard specifications for the
things that they CAN do.

I don't know what you mean by "pressure on ANSI".  The X3J11
committee is trying to produce the most useful feasible C standard.
Does that constitute some sort of "pressure"?

I really wish contributors to this news group would limit their
discussion of the proposed ANSI standard for C to asking questions
and making technical suggestions.  These attacks on the competence
of X3J11 do nothing but show how little the attackers understand
about the issues involved.  It is perhaps worth noting that not
long ago Dennis Ritchie commended the work done by X3J11; I haven't
heard that he's since changed his mind.  Now, there might be someone
out there with a better understanding of C, UNIX, and principles for
elegance in software design than Dennis, but I rather doubt it.

P.S.  Although I'm an X3J11 member, I'm not an official spokesman
for them.  I would like to remark that I'm proud to be associated
with such a group of bright, dedicated people with a broad spectrum
of backgrounds, most or all of whom are more knowledgeable than I
am in various issues directly related to the C standardization
effort.  I doubt that any other approach to C standardization would
produce overall results better than this one.  X3J11 can of course
use constructive input from others; you had one opportunity to
provide that earlier and will have another chance to review and
comment on the revised proposed standard soon (perhaps before the
end of this calendar year).  Please note, however, that the variety
of environments and applications make conflicting demands, so that
often a compromise solution is required for "optimality" (in the
linear-programming sense).

henry@utzoo.UUCP (Henry Spencer) (09/13/87)

> SUMMARY:  The weakened fseek in ANSI C will lead to fewer vendors being
> pressured into providing the more flexible UNIX-style fseek, without a
> compensating gain in portability.  Users will lose.

(I am resisting the temptation of a blow-by-blow answer to the whole
100-line article...)  Repeat after me:  "the purpose of standards committees
is to standardize existing practice, not to try to force goodness and truth
down everyone's throats".  The fact is, existing practice -- in the C
community as a whole rather than the somewhat bigoted and self-centered
Unix subset of it -- is exactly what is being codified in ANSI C.  It has
*never* been true that a portable program could assume full Unix fseek
semantics.  And trying to force all manufacturers to do a Unix-compatible
fseek is just as likely to be a loss for the users, because it will hamper
wide acceptance of the ANSI standard.

> **The only exception I know of, where a vendor did not use the UNIX
> standard as a model...

There are a lot more exceptions than the one you cite; this merely reflects
your limited experience, I'm afraid.  Most vendors would *like* to make
their fseek Unix-compatible, but not all can.
-- 
"There's a lot more to do in space   |  Henry Spencer @ U of Toronto Zoology
than sending people to Mars." --Bova | {allegra,ihnp4,decvax,utai}!utzoo!henry

chips@usfvax2.UUCP (Chip Salzenberg) (09/14/87)

In article <8993@ut-sally.UUCP>, nather@ut-sally.UUCP (Ed Nather) writes:
> In article <27734@sun.uucp>, guy@sun.uucp (Guy Harris) writes:
> > 
> > I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really
> > is.  
> 
> It's not so bad.  If the file is opened as a text file, it works just like
> Unix, since the CR code is removed on input and restored on output.  If,
> however, the file is opened in binary, then you must use ftell() to find
> out where you are.
> -- 
> Ed Nather

Well, almost.  Nice try, Ed.  :-)  A closer model of the truth:

If you open in text mode, then CR's are stripped and the NL's remain as line
terminators.  This makes text-oriented UNIX code work, but it breaks code
that assumes that if you read 50 bytes, that your file position is advanced
by 50.  There is also the difficulty that a Control-Z is considered as an
end-of-file, but fseek(..., 2) doesn't know where the Control-Z (if any) is.

When writing a text mode file, CR's are inserted.  Similar seeking problems.

If you open in binary mode, fseek() and ftell() have the same semantics as
UNIX -- but the g(l)orious MS-DOS file format is visible to the program,
CR's, Control-Z's, and all.
-- 
Chip Salzenberg            UUCP: "uunet!ateng!chip"  or  "chips@usfvax2.UUCP"
A.T. Engineering, Tampa    Fidonet: 137/42    CIS: 73717,366
"Use the Source, Luke!"    My opinions do not necessarily agree with anything.

peter@sugar.UUCP (09/14/87)

In article <1134@bsu-cs.UUCP>, dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
> **The only exception I know of, where a vendor did not use the UNIX
> standard as a model, was one that had "putfmt" instead of "printf", and
> a lot of other unusual functions.  I think it was from Whitesmiths.  I
> believe it has been changed since them.

Whitesmiths believed the UNIX programmers manual was copyright by AT&T and
thus they couldn't copy the functions described in it.
-- 
-- Peter da Silva `-_-' ...!hoptoad!academ!uhnix1!sugar!peter
--                 'U`      ^^^^^^^^^^^^^^ Not seismo!soma (blush)

gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/21/87)

In article <722@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>Whitesmiths believed the UNIX programmers manual was copyright by AT&T and
>thus they couldn't copy the functions described in it.

I don't believe this was an issue; after all, Whitesmiths did provide
many UNIX-compatible functions (even their own UNIXy system, Idris).
From discussions with Whitesmiths personnel, I gather that they
thought that their I/O routines were better-designed (more orthogonal,
etc.), so in the absence of standards (remember, their C system was
the first one available outside UNIX) they decided to provide more
useful routines.  The development of UNIX-like stdio as a de facto
standard occurred later, at which time one could get an implementation
of stdio for Whitesmiths C from Plum-Hall.  I believe Whitesmiths are
committed to providing ANSI-compatible facilities in future releases,
which means including the stdio functions.  (I don't know whether or
not they currently include these.)

ftw@datacube.UUCP (09/21/87)

> gwyn@brl-smoke.UUCP writes:
> In article <722@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
> >Whitesmiths believed the UNIX programmers manual was copyright by AT&T and
> >thus they couldn't copy the functions described in it.
> 
> I don't believe this was an issue; after all, Whitesmiths did provide
> many UNIX-compatible functions (even their own UNIXy system, Idris).

Whitesmiths has had at least some compatibility with the Unix "stdio"
functions since their 2.2 release in the spring of '83.  Admittedly,
some of it was clunky.

> From discussions with Whitesmiths personnel, I gather that they
> thought that their I/O routines were better-designed (more orthogonal,
> etc.), so in the absence of standards (remember, their C system was
> the first one available outside UNIX) they decided to provide more
> useful routines.

This is true, and a lot of those routines live on in the current compiler
package offerings from Whitesmiths.

> The development of UNIX-like stdio as a de facto
> standard occurred later, at which time one could get an implementation
> of stdio for Whitesmiths C from Plum-Hall.  I believe Whitesmiths are
> committed to providing ANSI-compatible facilities in future releases,
> which means including the stdio functions.  (I don't know whether or
> not they currently include these.)

Whitesmiths closely follows dpANS, and includes very nearly all of the
features/limitations therein in the current version of their compilers.
They are also active in P1003.


				Farrell T. Woods 

Datacube Inc. Systems / Software Group	4 Dearborn Rd. Peabody, Ma 01960
VOICE:	617-535-6644;	FAX: (617) 535-5643;  TWX: (710) 347-0125
UUCP:	ftw@datacube.COM,  ihnp4!datacube!ftw
	{seismo,cbosgd,cuae2,mit-eddie}!mirror!datacube!ftw