[comp.lang.c] File descriptors and streams and copying thereof.

rds95@leah.Albany.Edu (Robert Seals) (04/12/89)

I want to be able to make "stdin" read from someplace besides, well,
standard input in the middle of my program, and then go back to where
it was again. Think "lex" and "yacc"...

So what I did was this:

	FILE *my_file;

	/* sucessfully open "my_file" */
	...
	stdin->fd = my_file->fd;
	if (yyparse()) /* stuff*/ else /* stuff */
	stdin->fd = 0;	/* presume stdin is/was fd 0 */
	...

which seems to work - most of the time. So, is this truly icky style
or what, and what is the right way to do it?
And BTW, if you refer me to "dup" or something like it, could you also
please explain what it's supposed to do...the man pages for it are not
at all clear to me.

muchos,
rob

gjalt@euteal (Gjalt de Jong) (04/13/89)

I found this piece of code in original article:

>	stdin->fd = my_file->fd;
>	if (yyparse()) /* stuff*/ else /* stuff */
>	stdin->fd = 0;	/* presume stdin is/was fd 0 */

I do not hope there are many people out there using this tricky stuff to get
the file descriptor of a stream. Because, not all machines have this file
descriptor field in the file pointer structure!

But there are better ways to get information about streams. You can
use 'fileno', which is defined in <stdio.h>, to get the file descriptor of
a stream. 

--
Gjalt G. de Jong,                 | Phone: +(31)40-473345
Eindhoven University of Technology, Dept. of Electr. Eng.
P.O. Box 513, 5600 MB Eindhoven, The Netherlands
Email: gjalt@euteal       UUCP: {...}!mcvax!hp4nl!euteal!gjalt

kremer@cs.odu.edu (Lloyd Kremer) (04/14/89)

In article <1743@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes:

>I want to be able to make "stdin" read from someplace besides, well,
>standard input in the middle of my program, and then go back to where
>it was again.
>
>So what I did was this:
>
>	FILE *my_file;
>
>	/* sucessfully open "my_file" */
>	...
>	stdin->fd = my_file->fd;
>	if (yyparse()) /* stuff*/ else /* stuff */
>	stdin->fd = 0;	/* presume stdin is/was fd 0 */
>	...


Arbitrarily changing the FILE's descriptor without any other treatment of the
stdio stream can confuse stdio terribly.  If the stream's buffer has any
unflushed data, they may be lost or go to the "wrong place."

Although you are performing a mixture of high-level and low-level I/O
operations, your declaration of 'my_file' as a 'FILE *' would suggest that you
ultimately want to do high-level I/O.

Reserving the right to go back to the original stdin complicates things.  If
the work requiring redirected input is performed entirely within a child
process, the best bet would be to fork the process and rehook stdin between
the fork and the exec:

	if(!fork()){
		if(freopen("my_filename", "r", stdin) == (FILE *)0 || exec*(...) == -1){
			perror("");
			exit(1);
		}
	}
	wait(...);

The parent does not need to restore its stdin, since its stdin was never
redirected.

For the more general case in which stdin is to be temporarily changed and
subsequently restored within the same executable, it depends whether only
low-level I/O or high-level I/O is required.  If only low-level I/O
(using integer file descriptors) is required, this should suffice:


	int insave;
	/* for storage of original input file descriptor */


	insave = dup(0);
	/* give me a new file descriptor referring to the same file as
		stdin's original file descriptor */

	close(0);
	/* temporarily close fd 0 */

	if(open("my_filename", O_RDONLY) == -1)
	/* attempt to open your file.  If open succeeds, it will return
		file descriptor 0, since it always returns the earliest
		one, and we just closed 0 */

		forget the whole thing;
	else{
		do work requiring input redirection;
		close(0);
	}
	dup(insave);
	/* get a copy of the saved original input file descriptor.
		Again we know the lowest available file descriptor (0)
		will be returned */

	close(insave);
	/* we don't need the copy any more */

	proceed with non-redirected work;


But since you want high-level structures, we must enhance it a bit:


	int insave;
	/* for storage of original input file descriptor */


	insave = dup(fileno(stdin));
	/* give me a new file descriptor referring to the same file as
		stdin's original file descriptor (which was probably 0
		but let's not assume anything) */

	if(freopen("my_filename", "r", stdin) == (FILE *)0)
	/* fclose (including fflush) stdin, and perhaps fopen your file.
		If fopen succeeds, it will return a pointer to the same
		FILE as the original stdin had, since it always returns
		the earliest one, and stdin refers to _iob[0] */

		forget the whole thing;
	else{
		do work requiring input redirection;
		fclose(stdin);
	}
	fdopen(dup(insave), "r");
	/* get a copy of the saved original input file descriptor and
		associate it with a buffered stdio FILE structure.
		Again we know the lowest file descriptor and the lowest
		FILE (_iob[0]) will be returned */

	close(insave);
	/* we don't need the copy any more */

	proceed with non-redirected work;


					Lloyd Kremer
					Brooks Financial Systems
					{uunet,sun,...}!xanth!brooks!lloyd

rsalz@bbn.com (Rich Salz) (04/14/89)

In article <1743@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes:
-I want to be able to make "stdin" read from someplace besides, well,
-standard input in the middle of my program, and then go back to where
-it was again.
-
-So what I did was this:
-	FILE *my_file;
-	/* sucessfully open "my_file" */
-	stdin->fd = my_file->fd;
-	if (yyparse()) /* stuff*/ else /* stuff */
-	stdin->fd = 0;	/* presume stdin is/was fd 0 */

I had replied by mail, but all the detailed replies kind of scared
me...  Changing stdin in the middle of a run is a very implementation-
specific task, and should be directed to comp.unix.wizards, comp.sys.vms,
comp.sys.mac, etc., as appropriate.

Robert wants to have yyparse() read from a different file.  We did a
go-round on this a couple of months ago.  The answer is to provide your
own routine named input():
	#undef input
	input()
	{
		int c;

		if (read_from_stdin)
			c = getchar();
		else {
		    open_file_if_necessary();
		    c = getc(file);
		    if (feof(file))
			fclose(file);
		}
		return c == EOF ? '\0' : c;
	}
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

gregg@ihlpb.ATT.COM (Wonderly) (04/16/89)

From article <1672@fig.bbn.com>, by rsalz@bbn.com (Rich Salz):
> In article <1743@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes:
> >I want to be able to make "stdin" read from someplace besides, well,
> >standard input in the middle of my program, and then go back to where
> >it was again.
> 
> Robert wants to have yyparse() read from a different file.  We did a
> go-round on this a couple of months ago.  The answer is to provide your
> own routine named input():

The external symbol, yyin, is a (FILE *) which lex uses to read from.  Yyin
is initialized with (stdin), so you can just change it to whatever stream you
need, and then change it back.

-- 
Gregg Wonderly                             DOMAIN: gregg@ihlpb.att.com
AT&T Bell Laboratories                     UUCP:   att!ihlpb!gregg

henry@utzoo.uucp (Henry Spencer) (04/16/89)

In article <10222@ihlpb.ATT.COM> gregg@ihlpb.ATT.COM (Wonderly) writes:
>> Robert wants to have yyparse() read from a different file.  We did a
>> go-round on this a couple of months ago.  The answer is to provide your
>> own routine named input():
>
>The external symbol, yyin, is a (FILE *) which lex uses to read from.  Yyin
>is initialized with (stdin), so you can just change it to whatever stream you
>need, and then change it back.

We did a go-round on that too a couple of months ago.  Although it's very
attractive, yyin is **NOT** a documented part of the user interface
of lex and hence its use is not portable.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwc@root.co.uk (Geoff Clare) (04/21/89)

In article <8450@xanth.cs.odu.edu> kremer@cs.odu.edu (Lloyd Kremer) writes:
[trimmed down a bit - gwc]
>
>	int insave;
>	insave = dup(fileno(stdin));
>	if(freopen("my_filename", "r", stdin) == (FILE *)0)
>		forget the whole thing;
>	else{
>		do work requiring input redirection;
>		fclose(stdin);
>	}
>	fdopen(dup(insave), "r");
>	/* get a copy of the saved original input file descriptor and
>		associate it with a buffered stdio FILE structure.
>		Again we know the lowest file descriptor and the
>		lowest FILE (_iob[0]) will be returned */
		^^^^^^^^^^^
		* This is a false assumption *
>
>	close(insave);

I don't know where Lloyd got this idea from, but there is nothing in
any standard which guarantees a newly opened FILE will be the lowest
available.  In fact "lowest" may not have any sensible meaning on an
implementation which does not have an "_iob[]" style array of FILE
structures.

So be warned.  Code which relies on this behaviour is not portable.

It is, of course, safe to assume the stream returned by fopen() and
friends has the lowest available file descriptor (i.e. fileno(stream)),
but portable code should not assume that

	fclose(stdin); newstream = fdopen(somefd, "r");

will result in newstream==stdin.

Going back to the original question, temporary redirection of stdin can
be achieved (on UNIX(tm) systems) by a slight modification to the method
for "low-level I/O" which Lloyd gave in his article.

All that is necessary is a fflush(stdin) before any change of
the file associated with fileno(stdin).  This will result in loss
of data already read into the buffer, but the same is true of
Lloyd's proposed method, due to the freopen().

-- 

Geoff Clare    UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk   ...!mcvax!ukc!root44!gwc   +44-1-315-6600  FAX: +44-1-315-6622

bills@sequent.UUCP (Bill Sears) (04/25/89)

In article <8450@xanth.cs.odu.edu> kremer@cs.odu.edu (Lloyd Kremer) writes:
>	close(0);
>
>	if(open("my_filename", O_RDONLY) == -1)
>	/* attempt to open your file.  If open succeeds, it will return
>		file descriptor 0, since it always returns the earliest
>		one, and we just closed 0 */

And in article <731@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>It is, of course, safe to assume the stream returned by fopen() and
>friends has the lowest available file descriptor (i.e. fileno(stream)),
>...

It is NOT safe to assume that the file descriptor returned either
by open(2) or the fopen(3) family will be the lowest available file
descriptor.  Although this may be true on many machines, it is not
guaranteed to be the case (I have worked with a machine where this
did not always work).  Nowhere in the manuals that I have seen (v7,
SYS V.2, BSD4.2, BSD4.3) does the open(2) system call or the fopen(3)
library function guarantee that the file descriptor returned will
be the lowest one available, only the dup(2) system call makes that
guarantee.

karl@haddock.ima.isc.com (Karl Heuer) (04/26/89)

In article <1522@cfa.cfa.harvard.EDU> gwc@root.co.uk (Geoff Clare) writes:
>All that is necessary is a fflush(stdin) before any change of
>the file associated with fileno(stdin).  This will result in loss
>of data already read into the buffer, but the same is true of
>Lloyd's proposed method, due to the freopen().

Since we're talking about portable mechanisms, fflush() on an input stream
should also be avoided.  It doesn't work in all existing UNIXes, and it's
an undefined operation according to the pANS.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

kremer@cs.odu.edu (Lloyd Kremer) (04/26/89)

In article <15039@sequent.UUCP> bills@sequent.UUCP (Bill Sears) writes:

>It is NOT safe to assume that the file descriptor returned either
>by open(2) or the fopen(3) family will be the lowest available file
>descriptor.  Although this may be true on many machines, it is not
>guaranteed to be the case (I have worked with a machine where this
>did not always work).
>only the dup(2) system call makes that guarantee.


I do not dispute this observation in any way, although I find the
existence of a system where open() does not return the lowest available
descriptor quite curious.

It would seem to imply that this system has two different "gimme a file
descriptor" kernel routines, one guaranteeing the lowest and one not.
Also, it is common practice for systems to create the first three descriptors
originally by the sequence open(), dup(), dup().  In this system, they would
have to use another (more complicated) method to guarantee that the three
open descriptors are 0, 1, and 2, rather than, say, 13, 0, and 1.  In other
words, it seems they would have to do more work *not* to guarantee that open()
returns the lowest.

I would say that the designers of that system have never ascended the Sacred
Mountain.  They have not fasted and meditated and finally attained the
Spirit of UNIX!  :-) :-)

(Flame resistant suit on!!)

-- 
					Lloyd Kremer
					Brooks Financial Systems
					...!uunet!xanth!brooks!lloyd
					Have terminal...will hack!

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/26/89)

In article <8624@xanth.cs.odu.edu> kremer@cs.odu.edu (Lloyd Kremer) writes:
>In article <15039@sequent.UUCP> bills@sequent.UUCP (Bill Sears) writes:
>>It is NOT safe to assume that the file descriptor returned either
>>by open(2) or the fopen(3) family will be the lowest available file
>>descriptor.
>I would say that the designers of that system have never ascended the Sacred
>Mountain.  They have not fasted and meditated and finally attained the
>Spirit of UNIX!  :-) :-)

open(), dup(), etc. system calls are guaranteed to return the lowest
file descriptor not currently open for the process (when they succeed).
This has been codified in IEEE Std 1003.1, which also requires that
fopen() allocate a file descriptor as open() does.  There is no such
requirement on opendir(), which may or may not invoke open() depending
on implementation-specific circumstances.

Of course, systems that do not attempt POSIX conformance may do as
they please.  Non-UNIX systems may not even have an inherent notion
of file descriptor as a small integer index involved in their fopen()
implementations.

bills@sequent.UUCP (Bill Sears) (04/27/89)

In article <8624@xanth.cs.odu.edu> kremer@cs.odu.edu (Lloyd Kremer) writes:
>
>In article <15039@sequent.UUCP> bills@sequent.UUCP (Bill Sears) writes:
>
>>It is NOT safe to assume that the file descriptor returned either
>>by open(2) or the fopen(3) family will be the lowest available file
>>descriptor.  Although this may be true on many machines, it is not
>>guaranteed to be the case (I have worked with a machine where this
>>did not always work).
>>only the dup(2) system call makes that guarantee.
>
>I do not dispute this observation in any way, although I find the
>existence of a system where open() does not return the lowest available
>descriptor quite curious.

I don't pretend to understand how the system was implemented, as I
didn't have source code for the OS.  I spoke about this at some length
with the owners of the system (a group of consultants who contracted
UNIX operating system and C language training) and neither they nor
I could fathom a decent reason for the above behavior.  The only idea
that even half made sense was that the OS kept a record of the next
available file descriptor.  This made a search of the file descriptor
table unnecessary when an open was called.  Of course this doesn't 
completely explain the problem, because some sort of search is still
necessary to find the next available file descriptor for the next call 
to open :-(Although the test programs did seem to reflect this behavior).
As I recall the system was a small 80?86 system running some variant
of UNIX, but I can't remember the specifics (I either never knew or 
have forgotten).

gwc@root.co.uk (Geoff Clare) (04/28/89)

This is getting rather UNIX-specific, so readers of comp.lang.c who
aren't interested in what goes on "behind" fopen() etc. should hit 'n' now.


In article <731@root44.co.uk> I wrote:
>It is, of course, safe to assume the stream returned by fopen() and
>friends has the lowest available file descriptor (i.e. fileno(stream)),
>...

In article <15039@sequent.UUCP> bills@sequent.UUCP (Bill Sears) writes:
>It is NOT safe to assume that the file descriptor returned either
>by open(2) or the fopen(3) family will be the lowest available file
>descriptor.  Although this may be true on many machines, it is not
>guaranteed to be the case (I have worked with a machine where this
>did not always work).  Nowhere in the manuals that I have seen (v7,
>SYS V.2, BSD4.2, BSD4.3) does the open(2) system call or the fopen(3)
>library function guarantee that the file descriptor returned will
>be the lowest one available, only the dup(2) system call makes that
>guarantee.


I wasn't aware that there are still some backward systems around that
don't give this behaviour.  All of the many and various systems I have
worked on always give the lowest available file descriptor howsoever
it is obtained.  I still claim it is reasonable to assume this behaviour
because, in the future, all standards-conforming systems will guarantee it.

Open(), creat(), and dup() are all guaranteed to give the lowest available
file descriptor by the SVID, X/Open (XPG2 and XPG3) and POSIX.  POSIX also
guarantees this behaviour for pipe() (at least it says the two descriptors
will be the two lowest available - unfortunately it doesn't say the reading
side must be lower than the writing side, as one might assume).

As for fopen() etc.  POSIX says (8.2.3.1) "fopen() shall allocate a
file descriptor as open() does" and (8.2.3.3) "freopen() has the
properties of both fclose() and fopen()".

If anyone works on a system which doesn't behave like this, badger your
vendor to get it changed, because there are soon going to be a lot
of POSIX applications around which rely on it.

-- 

Geoff Clare    UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk   ...!mcvax!ukc!root44!gwc   +44-1-315-6600  FAX: +44-1-315-6622