[comp.lang.c] magic cookies given back by ftell, and used in fseek

dg@lakart.UUCP (David Goodenough) (05/21/88)

There has been much discussion in this group on the fact that on certain
systems (VMS I believe), ftell & fseek use magic cookies to tell about
file position. I have my asbestos suit ready if this suggestion is out of
line, but have the ANSI comittee considered providing some means to convert
from cookie format to and from character position in file. Come to that
is such a thing possible? I ask this based on the statement that the
cookie returned at end of file when gotten there in two different ways
was different. Am I correct in perceiving that this arises in a similar
manner to the fact that on some segmented architectures (e.g. 8088) there
are many different ways of pointing to the same address, or is it some
other reason. Since my knowledge of VMS is about nil, this is just a guess,
doubtless there will be many out there who can tell me the correct way
of doing things.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!adelie!cfisun!lakart!dg	+-+-+ |
						  	  +---+

ok@quintus.UUCP (Richard A. O'Keefe) (05/26/88)

In article <129@lakart.UUCP>, dg@lakart.UUCP (David Goodenough) writes:
> There has been much discussion in this group on the fact that on certain
> systems (VMS I believe), ftell & fseek use magic cookies to tell about
> file position. I have my asbestos suit ready if this suggestion is out of
> line, but have the ANSI comittee considered providing some means to convert
> from cookie format to and from character position in file. Come to that
> is such a thing possible?

Given the history of C, I suggest that the "magic cookie" idea may have had
more to do with the /370 or GECOS(sp?) implementations of C than with VMS.
Consider an IBM/370 "VB" file (variable length blocked records).  The
operating system wants you to position in the file by specifying the
relative block number (this is gross oversimplification, please don't
flame me, IBM fans).  If "character position" is interpreted as
(count of previous *blocks*) x (size of block) + (offset in current block)
then something could be done, but that won't be equal to the number of
characters getc() gave you while you were getting there.  For *real* fun
(fun until it hurts) try to figure out what magic cookies should do with
concatenated data sets.

Magic cookies should really be some sort of implementation-specific record.

The UNIX "ftell" manual page warns that on non-UNIX systems "arithmetic may
not meaningfully be performed on" magic cookies, and you have to regard
comparing for equality as arithmetic.  All you can do with a magic cookie
is say "go back there".

djones@megatest.UUCP (Dave Jones) (05/26/88)

in article <129@lakart.UUCP>, dg@lakart.UUCP (David Goodenough) says:
> 
> There has been much discussion in this group on the fact that on certain
> systems (VMS I believe), ftell & fseek use magic cookies to tell about
> file position. I have my asbestos suit ready if this suggestion is out of
> line, but have the ANSI comittee considered providing some means to convert
> from cookie format to and from character position in file.

The problem is that some operating systems have file-format built into
them :-(, whereas UNIX has unformated byte-streams :-).

One example is to store "lines" in variable length records, prefixed by a 
line-length number. So now if the devil chooses a byte-count offset x, and 
says fseek(x), you've got to do some pretty fancy stepping to find the
correct record and offset into it.  For such a file, ftell would
presumably return a magic cookie which codes for record number and
offset.

Sigh.


		-- Dave J.

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/26/88)

In article <129@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
>There has been much discussion in this group on the fact that on certain
>systems (VMS I believe), ftell & fseek use magic cookies to tell about

  As far as I know the value returned by (f)tell is a real address in
the file. The kicker is that the beginning of consecetive ten byte lines
may not be ten bytes apart, or even any fixed number of bytes. This is
an artifact of VMS having 11 (or so) file types.

  At one time the C library was so hosed that I gave my SysMgr the
following for report as an SPR:

	Problem: random i/o in VMS-C doesn't work

	Demo:
	... read part of a file ...
	val1 = ftell();		/* ask where we are	*/
	val2 = ftell();		/* ask again		*/
	if (val1 != val2) 
	{ /* this system hasn't a clue... */
	  fprintf(stderr, "Random file access error\n");
	  exit(2);
	}

Needless to say I wrote my own i/o routines from scratch, since we
couldn't wait for a fix. I'm told that the reply was "This will be fixed
in a future release." I'm glad I operate one level away from the vendor
(in VMS at least).
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/27/88)

In article <129@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
-... have the ANSI comittee considered providing some means to convert
-from cookie format to and from character position in file.

This is required for binary streams, but not for text streams.

-Am I correct in perceiving that this arises in a similar
-manner to the fact that on some segmented architectures (e.g. 8088) there
-are many different ways of pointing to the same address, ...

Yes, you have the right idea.

karl@haddock.ISC.COM (Karl Heuer) (05/27/88)

In article <1024@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>The UNIX "ftell" manual page warns that on non-UNIX systems "arithmetic may
>not meaningfully be performed on" magic cookies, and you have to regard
>comparing for equality as arithmetic.

That's not the common interpretation of "performing arithmetic".  I think the
ANSI Standard should explicitly state which comparison operators, if any, are
guaranteed to be meaningful for fseek-cookies in a conforming implementation.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

ok@quintus.UUCP (Richard A. O'Keefe) (05/27/88)

In article <1024@cresswell.quintus.UUCP> I wrote:
> The UNIX "ftell" manual page warns that on non-UNIX systems "arithmetic may
> not meaningfully be performed on" magic cookies, and you have to regard
> comparing for equality as arithmetic.
In article <4253@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) wrote:
> That's not the common interpretation of "performing arithmetic".

What we have here is an abstract data type "stream position"
whose representation happens to be an integer.
Except in those implementations which explicitly define the representation
(as UNIX defines it to be a byte count) there is no reason to expect the
representation relation to have any particular properties (putting some
number of randomly generated bits in, provided that fseek() ignored them,
would be within the "abstract data type" idea).
We already know that memcmp() is not a good method of comparing
abstract data types represented as records, and for the same reason
'==' is not a good method of comparing abstract data types represented
as integers.  I think it is fair to say that ftell(stdin) == ftell(stdout)
is performing an arithmetic operation, because == is an arithmetic
comparison.

I think it was Chris Torek who said that he had a new version of stdio
which let the programmer specify his own read/write/close/&c.  One thing
I would like in that is programmer-defined stream position objects.
For example, suppose I want to implement "concatenated files" (which I
have wanted to do).  Then a stream position might be an index into a
table (or even a pointer) and a byte offset within the selected file.
I can't do that with only 32 bits to play with.  (Well, I can malloc()
little blocks and return pointers to them, but there are problems with that.)

henry@utzoo.uucp (Henry Spencer) (05/28/88)

> ... have the ANSI comittee considered providing some means to convert
> from cookie format to and from character position in file. Come to that
> is such a thing possible? ...

Depending on the operating system, it can be arbitrarily difficult.  Worse,
on the more troublesome operating systems, "character position" is not very
useful and may not even be meaningful.  Some kinds of files under some
operating systems have semantics sufficiently different from Unix's simple
"sequence of bytes" concept that there simply is *NO* meaningful mapping
between the two.  You just cannot pretend that those files are sequences of
bytes; it will not work, or at least will not work in any useful way.
-- 
"For perfect safety... sit on a fence|  Henry Spencer @ U of Toronto Zoology
and watch the birds." --Wilbur Wright| {ihnp4,decvax,uunet!mnetor}!utzoo!henry

chris@mimsy.UUCP (Chris Torek) (05/28/88)

In article <1029@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe)
writes:
>What we have here is an abstract data type "stream position"
>whose representation happens to be an integer. ...

... which is somewhat unfortunate; the `format of choice' in C for
ADTs is an opaque pointer (`void *').

>I think it was Chris Torek who said that he had a new version of stdio
>which let the programmer specify his own read/write/close/&c.

Yes.  `&c' is `seek'; these are the only operations used in stdio itself.

>One thing I would like in that is programmer-defined stream position objects.

Unfortuantely, if the new interface is to be compatible with the old
---and this appears to be necessary to get the functions adopted anywhere;
certainly I would not install them here if they changed the return value
of fseek!---the return type must remain an integer (`long').

>I can't do [something useful] with only 32 bits to play with.  (Well,
>I can malloc() little blocks and return pointers to them, but there
>are problems with that.)

There *is* a solution, although it is not the prettiest:  Allocate a
block (`array') of real information, and return indicies into this
block (a la file descriptors).  The block can be expanded as necessary
by using realloc().  The Unix kernel simplifies the task by fixing
the number of descriptors (usually at something like 20 or 64 or 256).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (05/28/88)

In article <11695@mimsy.UUCP> I proposed making `seek cookies' in
the following manner:
>Allocate a block (`array') of real information, and return indicies
>into this block (a la file descriptors).  The block can be expanded as
>necessary by using realloc().

I then noted that

>The Unix kernel simplifies the task by fixing the number of descriptors
>(usually at something like 20 or 64 or 256).

I stopped here because the fire alarm was going off in my apartment
building.  (False alarm, fortunately.)  There is a more serious
problem with this scheme, and that is that there is no decent way
to tell when an ftell cookie is dead and can be freed.  A small,
limited number of cookies is insufficient.

Of course, one can flush all the cookies for a file when that file
is closed, but it is easy to imagine a file that stays open for
weeks at a time, seeking all the while.

Various solutions exist, none particularly clean.  The most obvious
are to add a function for disposing of seek cookies (`fkillseek()'?),
and to deem that a cookie dies when it is used:

	long save_pos;

	save_pos = ftell(f);
	// work on f
	(void) fseek(f, save_pos, 0);

	/* if we need it again, we must reactivate it: */
	save_pos = ftell(f);

The former requires an incompatible addition to stdio, while the
latter requires an incompatible change to those things that use
stdio.  Which would be easier remains uncertain, although at least
the latter rule has the virtue of simplicity.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

terry@wsccs.UUCP (Every system needs one) (06/04/88)

In article <129@lakart.UUCP>, dg@lakart.UUCP (David Goodenough) writes:
> There has been much discussion in this group on the fact that on certain
> systems (VMS I believe), ftell & fseek use magic cookies to tell about
> file position.

	No, not really.  They work fine on stream files.  The problem is
that they return record number on "Implied carriage control" (VMS default)
text files.
> [...] but have the ANSI comittee considered providing some means to convert
> from cookie format to and from character position in file. Come to that
> is such a thing possible?

No.  Only the record format, and ANSI doesn't usually say anything about
Digital's file system... they really shouldn't, anyway, as it is none of their
business.

> [...], or is it some other reason.

Yes.  It's the "real" storage format and RMS file access modes implied in the
C library.

use:

	ungetc( getc( fp));

This will correctly align records after a read so that ftell() doesn't
lie throught it's figurative teeth.


| Terry Lambert           UUCP: ...{ decvax, ihnp4 } ...utah-cs!century!terry |
| @ Century Software        OR: ...utah-cs!uplherc!sp7040!obie!wsccs!terry    |
| SLC, Utah                                                                   |
|                   These opinions are not my companies, but if you find them |
|                   useful, send a $20.00 donation to Brisbane Australia...   |
| 'Admit it!  You're just harrasing me because of the quote in my signature!' |