osd7@homxa.UUCP (Orlando Sotomayor-Diaz) (03/11/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c> mod.std.c Digest Mon, 11 Mar 85 Volume 4 : Issue 11 Today's Topics: more on "long int file pointers considered harmful" Standard File Offsets what ftell returns ---------------------------------------------------------------------- Date: Sun, 10 Mar 85 17:12:30 est From: Steve Ludlum <stevel@haddock.UUCP> Subject: more on "long int file pointers considered harmful" To: decvax!minow@cbosgd.ATT.UUCP If the problem is that a 32 bit int is not big enough then we should have long long or huge int type, usually being 64 bits. However for 99.9999% of the people I think long is just fine for lseek. The other 0.0001% of the people will be writing data base software and be using operating system calls anyway. The big loss from making it a structure though is the "not allow manipulation" of the file pointer. If 2**32 bytes is too small I think this is a specialized application that will need special handling. I am concerned that in 10 to 15 years there will be the need for huge int and it wll not be able to be added. There should at least be a reserved word so it can be added later. However I think it is too early to implement now, by a decade at least. ------------------------------ Date: Sat, 9 Mar 85 20:46:21 est From: Stuart Friedberg <seismo!rochester!stuart> Subject: Standard File Offsets To: std-c@cbosgd.ATT.UUCP [ The reply here is to Steve's original article, and not to the one above. -- Mod --] I would like to respond to Steve Lundlum's comment about what ftell returns. I am not terribly concerned about the need to make it a "non-primitive" of some sort, but I think he rejects the suggestion for the wrong reasons. Apparently, Steve feels that a 32 bit integer is good enough to indicate a file offset and if a machine can't support 32 bit integers we shouldn't bother with that machine. (Please excuse me if this is a misinterpretation) I believe this argument is incorrect on two grounds. 1) It doesn't plan for the high end of things. It is not at all inconceivable that people will want files (that at least appear to be) LONGER than 32 unsigned bits worth of bytes. 1.1) Implementations that (A) allow "holes" in the file as at least BSD does, or (B) extend the file abstraction to things in virtual memory as Multics does, will have users who want to randomly access sparse data scattered all over that 32 or greater bit address space. Note that (A) exists in U**X today, while (B) is something that Berkeley might throw at us someday in the forseeable future. 1.2) Applications like commercial databases USE Gigabytes worth of data that is NOT sparse. Suppose I have 10 million records, each 500 bytes long. Surprise! This won't fit in one file with a "mere" 32 bits of offset. Admittedly, I don't know of anyone who would use the existing U**X file system IMPLEMENTATION to create such a database, but I see no reason why a commercial house couldn't do its own careful system implementation and use the existing file abstraction as what the application programmer sees as opposed to the physical disk layout dependent garbage one typically has to wade through today. 2) It doesn't plan for the low end of things. U**X is moving to a LOT of machines and C is the language of choice for moving it. What about all the micros out there that DON'T SUPPORT 32 BIT NATIVE ARITHMETIC? There is no trouble doing 32 bit adds, it just takes more code. Ever program an 8080 or a Z80? They can't even do most *16 bit* arithmetic right! Obviously people are moving to similar machines and obviously their compilers will support arithmetic on quantities longer than the available native "word". It's not good enough to say "Well, you can't store a 32 bit file offset in a single addressable location, so we just won't port the system to your machine at all". >From all this and a lot I left unsaid I would conclude 1) 32 bits will serve 95+% of today's C environments 2) It's both too small and too large to support the variety of ports we can expect in the near future. 3) Since the maximum size of a file is tied to the implementation of the file system, NOT to the size of the "int" or the "long", we should make the offset an implementation defined typedef or struct with a "well-known" name (i.e., standardize the name not the size) 4) To make things really portable we need well-known routines that will take one of these offset types and a signed int or long and return a new offset. Probably there should also be comparison routines and everything else you need to do useful arithmetic. 5) Most applications probably won't use all the portable features even if we provide them, because the programmer *knows* the sizes and implementations of various things. BOTTOM LINE 6) The whole question of standardizing the "standard IO" library is more about the environment in which we expect to run C programs than the language itself. And on that note, I will point out that big IBM systems will allow a single file to take up an entire MSS3850, which is FAR more that a paltry 4 Gig. Do we give up on C for these environments? Stu Friedberg {seismo, allegra}!rochester!stuart stuart@rochester ------------------------------ Date: 10 Mar 85 00:42:33 CST (Sun) From: utzoo!henry Subject: what ftell returns To: ihnp4!cbosgd!std-c > It is unreasonable to not allow manipulation of the argument > passed to fseek. What machines do not have long as at least a 32 > bit int.... There are actually about three separate problems here. First is that non-Unix systems often do not support the Unix file model, and the argument to fseek cannot be a byte offset -- it has to be a "magic cookie" containing more complex information. In such circumstances, there are no meaningful manipulations you can do on the fseek argument; the only valid argument is something previously obtained from ftell. Second, 32 bits no longer looks infinite as a file size. There are Unix sites (Chemical Abstracts, for one) which routinely deal with gigabyte files. It would be nice if the standard would leave the nature of the ftell/fseek file position open, to permit graceful growth. Third is that there are some implementations with 64-bit longs and 32-bit file offsets. In this situation, "long" is the wrong type to feed to fseek. (Please, no flames about the desirability of these sizes for integers; I didn't do it.) I was most disappointed to see that the ANSI document defined the fseek argument to be "long". At the very least, it should have been left as a symbolically-named integer type. I am not entirely averse to the idea that it should be a symbolically-named something, with the details left entirely up to the implementation. I see nothing that would break. Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry ------------------------------ End of mod.std.c Digest - Mon, 11 Mar 85 12:02:43 EST ****************************** USENET -> posting only through cbosgd!std-c. ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C) In all cases, you may also reply to the author(s) above.