[comp.unix.questions] File descriptors

robertd@ncoast.UUCP (05/20/87)

 Can some one tell me what a "File Descriptor" is? Thank you.

		[> Rd

-- 
[=====================================]
[             Rob DeMarco             ]
[ UUCP:decvax!cwruecmp!ncoast!robertd ]
[                                     ]
[ "I hate 'Wheel of fortune'....and   ]
[  proud of it!!"                     ]
[=====================================]

gwyn@brl-smoke.UUCP (05/21/87)

In article <2532@ncoast.UUCP> robertd@ncoast.UUCP (Rob DeMarco) writes:
> Can some one tell me what a "File Descriptor" is? Thank you.

It's just an index into an open-file table maintained inside the
kernel; the open-file table is used to keep track of the state of
the open file (such as, where is the actual data, and how far into
the data is the file position pointer associated with this F.D.).
Think of it as a "handle" on the file that the kernel gives you when
you open it.  For further information read Ken Thompson's "UNIX
Implementation" (used to be in Vol. 2 of the UNIX manual set).

terryl@tekcrl.TEK.COM (05/22/87)

In article <5875@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <2532@ncoast.UUCP> robertd@ncoast.UUCP (Rob DeMarco) writes:
>> Can some one tell me what a "File Descriptor" is? Thank you.
>
>It's just an index into an open-file table maintained inside the
>kernel; the open-file table is used to keep track of the state of
>the open file (such as, where is the actual data, and how far into
>the data is the file position pointer associated with this F.D.).
>Think of it as a "handle" on the file that the kernel gives you when
>you open it.  For further information read Ken Thompson's "UNIX
>Implementation" (used to be in Vol. 2 of the UNIX manual set).

     I can't believe this. Doug, you really blew it on this one. Close but
no cigar.

     A file descriptor (since we haven't established an official definition of
a file descriptor, let me propose one: A file descriptor is the value you get
back from an "open" or "creat" system call. If that's not Doug's definition,
then he could be right) is just an index into a data structure; this data struc-
ture is part of the PER PROCESS information, not part of the system wide as
what Doug is alluding to. Now, this data structure in the per process infor-
mation does have pointers to the system wide data structures (what Doug refers
to as the "open-file table"), and everything else is as Doug described it
(well, there are a couple of other things, but they really don't have anything
to do with the original question).

     All of this is true for Berkeley 4.2/4.3 (I just looked at the code just
to make sure 4.3 didn't change what file descriptors mean). I have absolutely
no idea if this is the way System V does things.


				Terry Laskodi
				     of
				Tektronix

guy%gorodish@Sun.COM (Guy Harris) (05/22/87)

>      All of this is true for Berkeley 4.2/4.3 (I just looked at the code just
> to make sure 4.3 didn't change what file descriptors mean). I have absolutely
> no idea if this is the way System V does things.

Well, this happens to be one of the things that has remained pretty
much constant in UNIX implementations since at least V6 (and probably
back further than that).  4.3 didn't change anything of consequence
(which isn't really surprising - there's really little that needs
changing), and neither 4.2BSD, nor System III, nor System V,
introduced any major changes.  (The only change 4.2BSD made was to
have objects other than inodes attached to a file table entry; Bill
Shannon noted that, given a system with multiple "file system types",
it might have been possible to use that mechanism instead.)

The distinction between a file descriptor (i.e., either the small
number you get back from "open", "dup", etc.)  and a file table entry
(the entry in the system-wide table that indicates things like the
current seek pointer) is not significant in most cases, so Doug's
description is, at worst, a slight over-simplification.  The only
per-file-descriptor state in the system is the "close on 'exec'"
flag.  Most operations treat all file descriptors that refer to the
same file table entry as equivalent.  To quote from the S5R3 manual
page DUP(2):

	"dup" returns a new file descriptor having the following in
	common with the original:

		Same open file (or pipe).

		Same file pointer (i.e., both file descriptors share
		one file pointer).

		Same access mode (read, write or read/write).

which describes most of the state stored with a file descriptor.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/23/87)

In article <1673@tekcrl.TEK.COM> terryl@tekcrl.tek.com writes:
>Doug, you really blew it on this one.

Ahem, I certainly am aware that the file-descriptor is an index into
a per-process data structure, which I chose to call an "open-file
table" both for simplicity (after all, if the inquirer knew all this
stuff he wouldn't be asking the question) and because that's what
Ken Thompson called it in the cited article.  Actually, he explained
the distinction between the "per-user open file table", which is
what I was describing, and the more global "open file table".  The
only reason there are two such tables rather than one is to permit
the sharing of the file position pointer across a fork.  I don't see
the necessity for this particular characteristic (I can't recall
ever having made use of it), so as far as I'm concerned there might
as well be only one open-file table.  File descriptors could then be
unique indices into the system table.  (The other use for per-process
indices is that one can then guarantee the use of small values such as
0, 1, and 2.)  I didn't feel it was worth trying to explain this two-
level aspect of the open file tables when the inquirer would
undoubtedly be happy to get any definite grasp of the concept.

aegl@root44.UUCP (05/27/87)

In article <5881@brl-smoke.ARPA> gwyn@brl-smoke.ARPA (Doug Gwyn) writes:
>The only reason there are two such tables rather than one is to permit
>the sharing of the file position pointer across a fork.  I don't see
>the necessity for this particular characteristic (I can't recall
>ever having made use of it), so as far as I'm concerned there might
>as well be only one open-file table.

You *must* have used this ... consider about what happens when you have
a shell script like this:

	$ cat hello.sh
	/bin/echo "Hello"
	/bin/echo "world"

and you run it with output redirected to a file.

	$ ./hello.sh >outputfile

Your shell opens/creates "outputfile", truncates it, does tricks with
dup() to make sure it is file descriptor 1. The file pointer for stdout
is now 0. The shell forks and execs the first "echo" this outputs "Hello"
- and the file pointer for stdout is set to 6 (5 chars in "Hello" + newline).
Then echo exits and the shell wakes up and execs the next echo. If the
file pointer hadn't been shared across the fork/exec then the shell
would still have it set at 0 - so the "world" would get written on top
of the "Hello".  Luckily (for every shell script that ever ran more than
one program that produced any output) the pointer was shared so the "world"
starts at byte offset 6 in "outputfile".

	$ cat outputfile
	Hello
	world

Tony Luck - Technical Manager, Root Computers Ltd. <aegl@root.co.uk>

roy@phri.UUCP (05/27/87)

In <5881@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
> The only reason there are [distinct per-system and per-process open file]
> tables rather than one is to permit the sharing of the file position
> pointer across a fork.  I don't see the necessity for this particular
> characteristic (I can't recall ever having made use of it)

	The last program I can think of that took advantage of this sharing
was the v6 shell (boy, I seem to be on a nostalgia trip these past few
days; maybe that old-timers BOF isn't such a bad idea).  Exit and goto (and
if) were not built in to the shell, but were fork/exec'ed just like any
other command.  Since they shared stdin with the shell (and the shell
didn't do buffered reads) they could do seeks on stdin and alter what line
in your shell script file the parent shell would read next.  Goto would
rewind stdin and search for the label, leaving the file pointer right after
it; exit would simply seek to EOF on stdin; when it exited, the shell would
see EOF and exit just as if you typed control-D.

	I never thought to try this before, but I wonder what would have
happened if you did "(sleep 10; goto foo)&" inside a shell script.  Yuck!

	Modern shells have goto and exit, as well as if/then/else, for,
while, and the kitchen sink built in.  This makes shell scripts run faster.
It also makes them not fit into a 64k address space.

	BTW, I agree with Doug; when answering a question, it is better to
leave out some details if that makes the gist of the answer clearer.  The
questioner can always come back for more later.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

guy%gorodish@Sun.COM (Guy Harris) (05/29/87)

> 	Modern shells have goto and exit, as well as if/then/else, for,
> while, and the kitchen sink built in.  This makes shell scripts run faster.
> It also makes them not fit into a 64k address space.

If you leave the kitchen sink out, you can fit it into a 64k address
space.  I've seen the Bourne shell and the PWB/UNIX 1.0 or Mashey
shell both run on a non-split-I&D-space PDP-11.  (The Bourne shell,
BTW, doesn't have "goto" built in; it's a "goto"less language.)
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com