dhesi@bsu-cs.UUCP (01/01/70)
In article <6423@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: These attacks on the competence of X3J11 do nothing but show how little the attackers understand about the issues involved. I have no intention of attacking the competence of X3J11 and I apologise if I gave that impression. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/10/87)
In article <8560@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >Stdio includes fread, fwrite, and fseek. The X3J11 drafts do >put some restrictions on portable uses of them, which are inevitable given >that the full generality of something like Unix seeks is unimplementable >on some systems. I realize I'm in the minority, but ANSI did something wrong here. ANSI is supposed to be standardizing an existing language. Even the drastic new feature of function prototypes is (a) upward compatible with all or most existing software and (b) widely believed to be badly needed. No such justification exists for crippling the beautiful and simple semantics of fseek that have been in use for many years. ANSI had a simple choice: (a) Leave fseek as it is, with the result that some vendors would not be able to honestly claim conformance with the ANSI standard until they modified their operating systems to support a generalized seek; (b) Change fseek so such vendors would not have to work so hard. The portability argument is a red herring. ANSI is free to add an appendix that describes a weaker fseek, in which one cannot directly to go where one has not sequentially gone before, that nonconforming C implementations can provide. Software developers who really want to support all systems, including the ones whose developers refuse to fix their punched-card-based designs, could restrict themselves to this weaker specification. The rest of us would be able to write programs as we've been writing them for a decade without being accused of not conforming to ANSI specs. C compilers for, UNIX, MS-DOS, AmigaDOS, Macintosh, CP/M, Minix, OS/2, and numerous other systems support a generalized fseek. Even VAX/VMS, which is heavily into record-based I/O, supports stream-LF files that allow the original fseek semantics to be preserved. There is no reason, other than the practical realization that it's more profitable to channel resources into persuading ANSI than into changing the operating system, that other vendors cannot do the same thing. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
guy@sun.uucp (Guy Harris) (09/11/87)
> I realize I'm in the minority, but ANSI did something wrong here. ANSI > is supposed to be standardizing an existing language. ... > No such justification exists for crippling the beautiful and simple > semantics of fseek that have been in use for many years. Make that "have been in use on *some* systems for many years". I agree, the UNIX semantics of "fseek" are wonderful and beautiful and all that irrelevant Mom-and-apple-pie stuff, but they aren't always implementable on non-UNIX systems. > ANSI had a simple choice: (a) Leave fseek as it is, Here is "fseek" "as it is", from the document "A New Input-Output Package", by D. M. Ritchie, Bell Laboratories, Murray Hill, New Jersey 07974: fseek(ioptr, offset, ptrname) FILE *ioptr; long offset The location of the next byte in the stream named by "ioptr" is adjusted. "Offset" is a long integer. If "ptrname" is 0, the offset is measured from the beginning of the file; if "ptrname" is 1, the offset is measured from the current read or write pointer; if "ptrname" is 2, the offset is measured from the end of the file. The routine accounts properly for any buffering. (When this routine is used on non-Unix systems, the offset must be a value returned from "ftell" and the ptrname must be 0). The only difference between this and what appears in the August 3, 1987 ANSI C draft is that: 1) DMR's description didn't mention the possibility of "offset" being 0 being used as a portable "rewind" function; perhaps the intent was that "rewind" be used for this, because the cited document does not state that "rewind(f)" is equivalent to "fseek(f, 0L, 0)". 2) DMR's description doesn't allow for the "offset" being a byte ordinal number on binary files - but his description didn't even *mention* binary files; it didn't describe the "b" flag to "fopen". So ANSI *did* leave "fseek" as it is *in descriptions of it as a C language routine*; they didn't "change 'fseek' so (vendors with OSes where it can't act as a generalized seek) would not have to work so hard". They didn't give the description of "fseek" *as a UNIX library routine*, but X3J11 is not a UNIX interface standard! Actually, given point 2) there, you could argue that they made it *more* like the UNIX "fseek" than Dennis' paper did. You *do* have the ability to deal with the file as an ordered sequence of bytes; however, to do so you must open the file as a binary file, which means you won't see UNIX-style lines unless the native OS implements them. (For instance, such a file could be treated in a record-oriented OS as a sequence of 512-byte records.) As such, you *can* port programs of the sort you're used to writing on UNIX to those other systems *as long as you use the "b" option to "fopen" and as long as you're willing to accept that these files may be in a private format comprehensible only to other C programs or programs that know about this format*. You just can't be guaranteed to do this sort of thing on *text* files. > The portability argument is a red herring. ANSI is free to add an > appendix that describes a weaker fseek, in which one cannot directly to > go where one has not sequentially gone before, that nonconforming C > implementations can provide. Software developers who really want to > support all systems, including the ones whose developers refuse to fix > their punched-card-based designs, could restrict themselves to this > weaker specification. The rest of us would be able to write programs > as we've been writing them for a decade without being accused of not > conforming to ANSI specs. If this were done, there would be a lot fewer compliant implementations out there, so people who were interested in writing not just standard-conforming but code that was *in practice* portable, would conform to the *de facto* standard formed by replacing the standard's "fseek" by the one described in this appeendix. In effect, this would mean that the *de facto* ANSI C standard, as opposed to the *de jure* ANSI C standard, would not include a UNIX-flavored "fseek". What has this bought you? > C compilers for, UNIX, MS-DOS, AmigaDOS, Macintosh, CP/M, Minix, OS/2, > and numerous other systems support a generalized fseek. UNIX and Minix are red herrings here; those systems implement UNIX-compatible I/O. If any of those operating systems store lines UNIX-style, with a single end-of-line character, implementing UNIX-style "fseek" isn't difficult, as the translation between native and C lines does not change the number of bytes in a record. (I infer from the Lightspeed C manual that the Macintosh puts CR rather than LF at the end of the line, so C implementations on the Mac can provide UNIX-style "fseek".) I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really is. It would seem that the file position would either have to be the ordinal number of the current byte in the underlying file - in which case, were you to use UNIX-style "fseek"s, you could conceivably confuse the heck out of the I/O library by putting the file pointer on the LF of a CR/LF pair - or would have to be translated to what the byte offset would have been, had MS-DOS used UNIX-style line formats - in which case, seeks could end up being quite expensive or require an auxiliary data structure to do the mapping. Even if you have this auxiliary data structure, you would either have to keep it around in permanent storage for all text files, which seems a bit tacky (and doesn't solve the problem of text files created before this auxiliary data structure was introduced) or would have to contruct it as needed, which could get expensive. > Even VAX/VMS, which is heavily into record-based I/O, supports stream-LF > files that allow the original fseek semantics to be preserved. Which doesn't help you if you feed a non-stream-LF file to a C program as an input text file; if you can't do that, there is a strong disincentive to write text-processing applications in C. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
flaps@utcsri.UUCP (09/11/87)
In article <1129@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >The portability argument is a red herring. ANSI is free to add an >appendix that describes a weaker fseek, in which one cannot directly >go where one has not sequentially gone before, that nonconforming C >implementations can provide. Software developers who really want to >support all systems, including the ones whose developers refuse to fix >their punched-card-based designs, could restrict themselves to this >weaker specification. The rest of us would be able to write programs >as we've been writing them for a decade without being accused of not >conforming to ANSI specs. You are missing the point of a standard. If so many systems will be supporting only the weaker fseek(), what's the point of having everyone look at you and nod approvingly that you're following the standard, when your programs are still not portable? If many people only support the weaker fseek(), then that's all that's standardized, despite any official ANSI blessing which you are asking for. A blessing gets you nothing. We're trying to be able to write portable programs. ajr, C programmer at large
dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/12/87)
In article <27734@sun.uucp> guy@sun.uucp (Guy Harris) writes: [first major point summarized here in my words]: Dennis Ritchie's description of fseek includes an exception for non-UNIX systems, and ANSI's description of fseek largely conforms to that exception. I can't argue this on legalistic grounds, but when vendors have implemented C on non-UNIX systems, they have always** used the UNIX implementation as a de facto standard. A vendor whose version of C is different from that under UNIX faces a competitive pressure to conform. When a user wants to know why a C implementation differs from the UNIX way, it's probably not going to be effective for a vendor to point out the exception that Ritchie made for non-UNIX systems. But now, the standard to model implementations after will not be UNIX but the ANSI standard. To the extent that the ANSI standard weakens the power of the C standard library, the user will lose. For example, the mail delivery agent smail uses a binary search on a sorted text file containing mail paths. Unless I'm missing something, such a binary search will be impossible in a C implementation that conforms to the ANSI standard and goes no further. >If this were done, there would be a lot fewer compliant implementations out >there, so people who were interested in writing not just standard-conforming >but code that was *in practice* portable, would conform to the *de facto* >standard formed by replacing the standard's "fseek" by the one described in >this appeendix. In effect, this would mean that the *de facto* ANSI C >standard, as opposed to the *de jure* ANSI C standard, would not include a >UNIX-flavored "fseek". What has this bought you? Those conforming to the de facto fseek would still continue to try to make it into the de jure fseek. It's a competitive advantage for a vendor to be able to claim full compliance with an ANSI standard. In the long run, it would be more likely that most vendors would offer the UNIX-style fseek. Users would win. It's quite possible that, had ANSI C existed some years ago, DEC would have managed to conform to it without having to introduce stream-LF files. Users in general would have been losers. >I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really >is. It would seem that the file position would either have to be the ordinal >number of the current byte in the underlying file - in which case, were you to >use UNIX-style "fseek"s, you could conceivably confuse the heck out of the I/O >library by putting the file pointer on the LF of a CR/LF pair - or would have >to be translated to what the byte offset would have been, had MS-DOS used >UNIX-style line formats - in which case, seeks could end up being quite >expensive or require an auxiliary data structure to do the mapping. Confession: I exaggerated about MSDOS. On the compilers I've tried you can fseek, but you get to fseek to the nth byte, where the nth byte is the same byte that you would get if you opened the file as a binary file. I think most implementations of stdio under MSDOS simply ignore all CR characters on a read, so no confusion will result after a generalized fseek. Note that a binary search on a text file will still work, which cannot be said for ANSI's more restrictive fseek. >> Even VAX/VMS, which is heavily into record-based I/O, supports stream-LF >> files that allow the original fseek semantics to be preserved. > >Which doesn't help you if you feed a non-stream-LF file to a C program as an >input text file; if you can't do that, there is a strong disincentive to write >text-processing applications in C. Not really. One can still sequentially read any VMS text file. The output from the application can be in stream-LF format. Because of the competitive pressure to conform to UNIX conventions, DEC has modified most (perhaps all) its utilities that normally use text files to also accept stream-LF format. VMS will even load and execute a file in stream-LF format if it has the same data bytes as a standard executable 512-byte fixed-length record executable file. (I couldn't believe my eyes when I saw this.) DEC is getting a little closer to embracing the UNIX model, and is no worse off for it. This would likely not have happened if the standard to aspire to had been ANSI C rather than the de facto standard of the UNIX implementation. I believe at one time ANSI actually allowed a binary file to return more characters on a read than had been ever written to it. That such bizarre behavior could be even considered, let alone included in the draft, shows how much pressure there must be on ANSI. SUMMARY: The weakened fseek in ANSI C will lead to fewer vendors being pressured into providing the more flexible UNIX-style fseek, without a compensating gain in portability. Users will lose. --- **The only exception I know of, where a vendor did not use the UNIX standard as a model, was one that had "putfmt" instead of "printf", and a lot of other unusual functions. I think it was from Whitesmiths. I believe it has been changed since them. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
nather@ut-sally.UUCP (09/12/87)
In article <27734@sun.uucp>, guy@sun.uucp (Guy Harris) writes: > > I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really > is. It's not so bad. If the file is opened as a text file, it works just like Unix, since the CR code is removed on input and restored on output. If, however, the file is opened in binary, then you must use ftell() to find out where you are. -- Ed Nather Astronomy Dept, U of Texas @ Austin {allegra,ihnp4}!{noao,ut-sally}!utastro!nather nather@astro.AS.UTEXAS.EDU
gwyn@brl-smoke.UUCP (09/12/87)
In article <1129@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >No such justification exists for crippling the beautiful and simple >semantics of fseek that have been in use for many years. Not only does such a justification exist, if you had read the Rationale document before spouting off, you would know what it is. "Whereas a binary file can be treated as an ordered sequence of bytes, counting from zero, a text file need not map one-to-one to its internal representation (see \(sc4.9.2). Thus, only seeks to an earlier reported position are permitted for text files." In other words, text files simply cannot be counted on to support the UNIX byte-array model, for a variety of technical reasons that were thoroughly discussed by X3J11 in the process of specifying fseek(). The reason for not requiring SEEK_END be supported for binary streams is that many systems do not maintain an EOF mark (some use a ^Z byte followed by all NUL bytes in the last allocated block, some don't even have that much of a marker). The Rationale doesn't seem to explain this particular point; perhaps it should. Of course, UNIX implementations of fseek() will provide additional semantics, and POSIX requires this in a couple of ways (identity of text and binary streams; fseek() inheritance of lseek() semantics). It is not within X3J11's assigned scope to insist that only UNIX-like operating systems are valid, even if the committee believed that. By the way, the idea that UNIX-like semantics be guaranteed and the C implementation be limited to supporting just one of several available file types (for example, VAX/VMS text stream type) has been declared by several vendors to be unacceptable to their customers. I have no reason to doubt that. The alternative to the weaker-than-UNIX specification of fseek() would have been to not require it at all for ANSI C. I hope it is obvious that that alternative is considerably less desirable.
gwyn@brl-smoke.UUCP (09/12/87)
In article <1134@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >I believe at one time ANSI actually allowed a binary file to return >more characters on a read than had been ever written to it. That such >bizarre behavior could be even considered, let alone included in the >draft, shows how much pressure there must be on ANSI. An implementation-defined number of NUL characters (bytes) is still allowed to be appended to a binary stream that was earlier written under the same implementation (not necessarily the same process). It doesn't matter how "bizarre" you find this, it's a fact that some operating systems are like that. It's much better that a programmer be warned that this can happen than to have him remain ignorant of such important facts of life. Nobody in his right mind would attempt to guarantee that all code already written for UNIX systems will work unchanged on all other operating systems. Binary search on text files is a good example of exploiting non-portable system characteristics. It is simply not true that existing code for that would be portable "if only" ANSI C would insist that text files be treatable as randomly- addressable byte arrays, except in the trivial sense that there would then be fewer conforming implementations to port the code to. Vendors who can will undoubtedly give the interface to text files as many UNIX-like properties as possible, since that will cause their customers the least trouble. Vendors who can't, won't anyway; better that they have standard specifications for the things that they CAN do. I don't know what you mean by "pressure on ANSI". The X3J11 committee is trying to produce the most useful feasible C standard. Does that constitute some sort of "pressure"? I really wish contributors to this news group would limit their discussion of the proposed ANSI standard for C to asking questions and making technical suggestions. These attacks on the competence of X3J11 do nothing but show how little the attackers understand about the issues involved. It is perhaps worth noting that not long ago Dennis Ritchie commended the work done by X3J11; I haven't heard that he's since changed his mind. Now, there might be someone out there with a better understanding of C, UNIX, and principles for elegance in software design than Dennis, but I rather doubt it. P.S. Although I'm an X3J11 member, I'm not an official spokesman for them. I would like to remark that I'm proud to be associated with such a group of bright, dedicated people with a broad spectrum of backgrounds, most or all of whom are more knowledgeable than I am in various issues directly related to the C standardization effort. I doubt that any other approach to C standardization would produce overall results better than this one. X3J11 can of course use constructive input from others; you had one opportunity to provide that earlier and will have another chance to review and comment on the revised proposed standard soon (perhaps before the end of this calendar year). Please note, however, that the variety of environments and applications make conflicting demands, so that often a compromise solution is required for "optimality" (in the linear-programming sense).
henry@utzoo.UUCP (Henry Spencer) (09/13/87)
> SUMMARY: The weakened fseek in ANSI C will lead to fewer vendors being > pressured into providing the more flexible UNIX-style fseek, without a > compensating gain in portability. Users will lose. (I am resisting the temptation of a blow-by-blow answer to the whole 100-line article...) Repeat after me: "the purpose of standards committees is to standardize existing practice, not to try to force goodness and truth down everyone's throats". The fact is, existing practice -- in the C community as a whole rather than the somewhat bigoted and self-centered Unix subset of it -- is exactly what is being codified in ANSI C. It has *never* been true that a portable program could assume full Unix fseek semantics. And trying to force all manufacturers to do a Unix-compatible fseek is just as likely to be a loss for the users, because it will hamper wide acceptance of the ANSI standard. > **The only exception I know of, where a vendor did not use the UNIX > standard as a model... There are a lot more exceptions than the one you cite; this merely reflects your limited experience, I'm afraid. Most vendors would *like* to make their fseek Unix-compatible, but not all can. -- "There's a lot more to do in space | Henry Spencer @ U of Toronto Zoology than sending people to Mars." --Bova | {allegra,ihnp4,decvax,utai}!utzoo!henry
chips@usfvax2.UUCP (Chip Salzenberg) (09/14/87)
In article <8993@ut-sally.UUCP>, nather@ut-sally.UUCP (Ed Nather) writes: > In article <27734@sun.uucp>, guy@sun.uucp (Guy Harris) writes: > > > > I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really > > is. > > It's not so bad. If the file is opened as a text file, it works just like > Unix, since the CR code is removed on input and restored on output. If, > however, the file is opened in binary, then you must use ftell() to find > out where you are. > -- > Ed Nather Well, almost. Nice try, Ed. :-) A closer model of the truth: If you open in text mode, then CR's are stripped and the NL's remain as line terminators. This makes text-oriented UNIX code work, but it breaks code that assumes that if you read 50 bytes, that your file position is advanced by 50. There is also the difficulty that a Control-Z is considered as an end-of-file, but fseek(..., 2) doesn't know where the Control-Z (if any) is. When writing a text mode file, CR's are inserted. Similar seeking problems. If you open in binary mode, fseek() and ftell() have the same semantics as UNIX -- but the g(l)orious MS-DOS file format is visible to the program, CR's, Control-Z's, and all. -- Chip Salzenberg UUCP: "uunet!ateng!chip" or "chips@usfvax2.UUCP" A.T. Engineering, Tampa Fidonet: 137/42 CIS: 73717,366 "Use the Source, Luke!" My opinions do not necessarily agree with anything.
peter@sugar.UUCP (09/14/87)
In article <1134@bsu-cs.UUCP>, dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: > **The only exception I know of, where a vendor did not use the UNIX > standard as a model, was one that had "putfmt" instead of "printf", and > a lot of other unusual functions. I think it was from Whitesmiths. I > believe it has been changed since them. Whitesmiths believed the UNIX programmers manual was copyright by AT&T and thus they couldn't copy the functions described in it. -- -- Peter da Silva `-_-' ...!hoptoad!academ!uhnix1!sugar!peter -- 'U` ^^^^^^^^^^^^^^ Not seismo!soma (blush)
gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/21/87)
In article <722@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes: >Whitesmiths believed the UNIX programmers manual was copyright by AT&T and >thus they couldn't copy the functions described in it. I don't believe this was an issue; after all, Whitesmiths did provide many UNIX-compatible functions (even their own UNIXy system, Idris). From discussions with Whitesmiths personnel, I gather that they thought that their I/O routines were better-designed (more orthogonal, etc.), so in the absence of standards (remember, their C system was the first one available outside UNIX) they decided to provide more useful routines. The development of UNIX-like stdio as a de facto standard occurred later, at which time one could get an implementation of stdio for Whitesmiths C from Plum-Hall. I believe Whitesmiths are committed to providing ANSI-compatible facilities in future releases, which means including the stdio functions. (I don't know whether or not they currently include these.)
ftw@datacube.UUCP (09/21/87)
> gwyn@brl-smoke.UUCP writes: > In article <722@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes: > >Whitesmiths believed the UNIX programmers manual was copyright by AT&T and > >thus they couldn't copy the functions described in it. > > I don't believe this was an issue; after all, Whitesmiths did provide > many UNIX-compatible functions (even their own UNIXy system, Idris). Whitesmiths has had at least some compatibility with the Unix "stdio" functions since their 2.2 release in the spring of '83. Admittedly, some of it was clunky. > From discussions with Whitesmiths personnel, I gather that they > thought that their I/O routines were better-designed (more orthogonal, > etc.), so in the absence of standards (remember, their C system was > the first one available outside UNIX) they decided to provide more > useful routines. This is true, and a lot of those routines live on in the current compiler package offerings from Whitesmiths. > The development of UNIX-like stdio as a de facto > standard occurred later, at which time one could get an implementation > of stdio for Whitesmiths C from Plum-Hall. I believe Whitesmiths are > committed to providing ANSI-compatible facilities in future releases, > which means including the stdio functions. (I don't know whether or > not they currently include these.) Whitesmiths closely follows dpANS, and includes very nearly all of the features/limitations therein in the current version of their compilers. They are also active in P1003. Farrell T. Woods Datacube Inc. Systems / Software Group 4 Dearborn Rd. Peabody, Ma 01960 VOICE: 617-535-6644; FAX: (617) 535-5643; TWX: (710) 347-0125 UUCP: ftw@datacube.COM, ihnp4!datacube!ftw {seismo,cbosgd,cuae2,mit-eddie}!mirror!datacube!ftw