[mod.std.c] mod.std.c Digest Volume 4 : Issue 11

osd7@homxa.UUCP (Orlando Sotomayor-Diaz) (03/11/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c>


mod.std.c Digest            Mon, 11 Mar 85       Volume 4 : Issue  11 

Today's Topics:
         more on "long int file pointers considered harmful"
                        Standard File Offsets
                          what ftell returns
----------------------------------------------------------------------

Date: Sun, 10 Mar 85 17:12:30 est
From: Steve Ludlum <stevel@haddock.UUCP>
Subject: more on "long int file pointers considered harmful"
To: decvax!minow@cbosgd.ATT.UUCP

If the problem is that a 32 bit int is not big enough then we
should have long long or huge int type, usually being 64 bits.

However for 99.9999% of the people I think long is just fine
for lseek. The other 0.0001% of the people will be writing 
data base software and be using operating system calls anyway.

The big loss from making it a structure though is the "not allow
manipulation" of the file pointer.

If 2**32 bytes is too small I think this is a specialized
application that will need special handling. I am concerned
that in 10 to 15 years there will be the need for huge int
and it wll not be able to be added. There should at least
be a reserved word so it can be added later. However I think
it is too early to implement now, by a decade at least.

------------------------------

Date: Sat, 9 Mar 85 20:46:21 est
From: Stuart Friedberg  <seismo!rochester!stuart>
Subject: Standard File Offsets
To: std-c@cbosgd.ATT.UUCP

[ The reply here is to Steve's original article, and not to the
one above.  -- Mod --]

I would like to respond to Steve Lundlum's comment about what ftell
returns. I am not terribly concerned about the need to make it a
"non-primitive" of some sort, but I think he rejects the suggestion for
the wrong reasons.

Apparently, Steve feels that a 32 bit integer is good enough to
indicate a file offset and if a machine can't support 32 bit integers
we shouldn't bother with that machine.  (Please excuse me if this is a
misinterpretation) I believe this argument is incorrect on two grounds.

1) It doesn't plan for the high end of things.  It is not at all
inconceivable that people will want files (that at least appear to be)
LONGER than 32 unsigned bits worth of bytes.

  1.1)  Implementations that
    (A) allow "holes" in the file as at least BSD does, or
    (B) extend the file abstraction to things in virtual memory as
          Multics does,
  will have users who want to randomly access sparse data scattered
  all over that 32 or greater bit address space. Note that (A) exists
  in U**X today, while (B) is something that Berkeley might throw at
  us someday in the forseeable future.

  1.2) Applications like commercial databases USE Gigabytes worth of
  data that is NOT sparse.  Suppose I have 10 million records, each 500
  bytes long. Surprise! This won't fit in one file with a "mere" 32
  bits of offset.  Admittedly, I don't know of anyone who would use the
  existing U**X file system IMPLEMENTATION to create such a database,
  but I see no reason why a commercial house couldn't do its own
  careful system implementation and use the existing file abstraction
  as what the application programmer sees as opposed to the physical
  disk layout dependent garbage one typically has to wade through
  today.

2) It doesn't plan for the low end of things. U**X is moving to a LOT
of machines and C is the language of choice for moving it.  What about
all the micros out there that DON'T SUPPORT 32 BIT NATIVE ARITHMETIC?
There is no trouble doing 32 bit adds, it just takes more code.  Ever
program an 8080 or a Z80? They can't even do most *16 bit* arithmetic
right!  Obviously people are moving to similar machines and obviously
their compilers will support arithmetic on quantities longer than the
available native "word". It's not good enough to say "Well, you can't
store a 32 bit file offset in a single addressable location, so we just
won't port the system to your machine at all".

>From all this and a lot I left unsaid I would conclude
1) 32 bits will serve 95+% of today's C environments
2) It's both too small and too large to support the variety of ports
   we can expect in the near future.
3) Since the maximum size of a file is tied to the implementation of
   the file system, NOT to the size of the "int" or the "long", we
   should make the offset an implementation defined typedef or struct
   with a "well-known" name (i.e., standardize the name not the size)
4) To make things really portable we need well-known routines that
   will take one of these offset types and a signed int or long and
   return a new offset.  Probably there should also be comparison
   routines and everything else you need to do useful arithmetic.
5) Most applications probably won't use all the portable features
   even if we provide them, because the programmer *knows* the sizes
   and implementations of various things.

BOTTOM LINE
6) The whole question of standardizing the "standard IO" library is
   more about the environment in which we expect to run C programs than
   the language itself.  And on that note, I will point out that big
   IBM systems will allow a single file to take up an entire MSS3850,
   which is FAR more that a paltry 4 Gig. Do we give up on C for these
   environments?

Stu Friedberg  {seismo, allegra}!rochester!stuart  stuart@rochester

------------------------------

Date: 10 Mar 85 00:42:33 CST (Sun)
From: utzoo!henry
Subject: what ftell returns
To: ihnp4!cbosgd!std-c

> It is unreasonable to not allow manipulation of the argument
> passed to fseek. What machines do not have long as at least a 32
> bit int....

There are actually about three separate problems here.

First is that non-Unix systems often do not support the Unix file
model, and the argument to fseek cannot be a byte offset -- it has
to be a "magic cookie" containing more complex information.  In
such circumstances, there are no meaningful manipulations you can
do on the fseek argument; the only valid argument is something
previously obtained from ftell.

Second, 32 bits no longer looks infinite as a file size.  There are Unix
sites (Chemical Abstracts, for one) which routinely deal with gigabyte
files.  It would be nice if the standard would leave the nature of the
ftell/fseek file position open, to permit graceful growth.

Third is that there are some implementations with 64-bit longs and
32-bit file offsets.  In this situation, "long" is the wrong type
to feed to fseek.  (Please, no flames about the desirability of
these sizes for integers; I didn't do it.)

I was most disappointed to see that the ANSI document defined the
fseek argument to be "long".  At the very least, it should have been
left as a symbolically-named integer type.  I am not entirely averse
to the idea that it should be a symbolically-named something, with
the details left entirely up to the implementation.  I see nothing
that would break.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

------------------------------

End of mod.std.c Digest - Mon, 11 Mar 85 12:02:43 EST
******************************
USENET -> posting only through cbosgd!std-c.
ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C)
In all cases, you may also reply to the author(s) above.