[comp.std.unix] Unified I/O namespace: what's the point?

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/05/90)

Submitted-by: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)

We all know that the best standards codify existing practice, while the
worst standards attempt to introduce new features without knowing what
they'll do. For example, POSIX 1003.1 has slaughtered some of my best
code and thrown huge roadblocks into my porting attempts, simply by
adding an unnecessary feature (sessions) that hadn't been proven to work
in the real world. It's a nice standard---except where it enters totally
uncharted territory.

Now we're looking at another possible addition to UNIX that hasn't been
widely tested: a unified namespace for opening all I/O objects. But we
already have a unified file descriptor abstraction for reading, writing,
and manipulating those objects, as well as passing them between separate
processes. Why do we need more?

I propose that we stop discussing this issue in comp.std.unix and start
implementing real-world solutions. My approach is to separate opening
and connecting into special programs, and stick to file descriptors for
almost all applications. If you have a different solution, such as
overloading open(), why don't you start playing with your library and
seeing what works? 

When we have a lot more real-world experience with various solutions,
we can come back here and consider standardization. Until then, ciao.

---Dan

Volume-Number: Volume 21, Number 187

ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe) (10/08/90)

Submitted-by: ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe)

In article <13220@cs.utexas.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> Now we're looking at another possible addition to UNIX that hasn't been
> widely tested: a unified namespace for opening all I/O objects. But we
> already have a unified file descriptor abstraction for reading, writing,
> and manipulating those objects, as well as passing them between separate
> processes. Why do we need more?

If you have to use different functions for creating file descriptors
in the first place, then you haven't got a unified file descriptor
abstraction.  Suppose I want to write a "filter" program that will
merge two streams.  It would be nice if I could pass descriptors to
a program, but that's not how most UNIX shells work; I have to pass
strings.  Now, my filter knows what it *needs* (sequential reading
with nothing missing or out of order, but if the connection is lost
somehow it's happy to drop dead) so it could easily do
	fd = posix_open(argv[n], "read;sequential;reliable;soft");
and then it can use any file, device, or other abstraction which will
provide this interface.  My program *can't* know what's available.
If someone comes along with a special "open hyperspace shunt" function;
my program can't benefit from it.  If hyperspace shunts are in the
global name space and posix_open() understands their name syntax, my
program will work just fine.

Surely this is the point?  We want our programs to remain useful when
new things are added that our programs could meaningfully work with.

I can see the point in saying "shared memory segments aren't much like
transput; let's keep them out of the global name space", but sockets
and NFS files and such *are* "transput-like".  Anything which will
support at least sequential I/O should look like a file.  If that
means that some things in the global name space are "real UNIX files"
with full 1003.1 semantics but some things aren't, that's ok as long
as my programs can find out whether they've got what they need.

One point to bear in mind is that application programs written in
C, Fortran, Ada, &c are likely to map file name strings in those
languages fairly directly to strings in the POSIX name space; to
keep something that _could_ have supported C, or Fortran, or Ada
transput requests out of the file name space is to make such things
unavailable to portable programs.  If some network connections can
behave like sequential files (even if they don't support full 1003.1
semantics), then why keep them out of reach to portable programs?

(I have used a system where a global name space was faked by the RTL.
Trouble is, different languages did it differently, if at all...)

Even shared memory segments *could* support read, write, lseek...

-- 
Fear most of all to be in error.	-- Kierkegaard, quoting Socrates.

Volume-Number: Volume 21, Number 190

ske@pkmab.se (Kristoffer Eriksson) (10/09/90)

Submitted-by: ske@pkmab.se (Kristoffer Eriksson)

In article <13220@cs.utexas.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Now we're looking at another possible addition to UNIX that hasn't been
>widely tested: a unified namespace for opening all I/O objects.
>
>I propose that we stop discussing this issue in comp.std.unix and start
>implementing real-world solutions.

I am already running a system where a file name can lead to any kind of
I/O object. It works fine, as far as I can judge. What more should I do?

(Not everything that could be is implemented via file names in this system,
but there are networks and databases that are interfaced via this mechanism,
and I like it a lot. Server programs attach themselves to directory or file
names, and will take care of all file operations attempted by clients on
this file or directory.)

> My approach is to separate opening and connecting into special programs,
>and stick to file descriptors for almost all applications.

Doesn't your objection about the semantics of open() on network connections
fall down in that case? Do your special programs for obtaining the file
descriptors make the real semantics of network connections available to
the application any more than open() does?

I think file names are more useful. How do you for instance stick a file
descriptor that you obtained by one of your special programs into the
configuration file for some program? File names are readily suitable for
that. If you just stick the network address into the file, the application
will be restricted to network connections (maybe only one type of network,
at that), and the application will have to know how to access that kind of
connection.

> If you have a different solution, such as
>overloading open(), why don't you start playing with your library and
>seeing what works? 

Too static. You will in practice be conserving the top level of the name
space inside your library routines. With non-shared libraries this would
mean you'ld have to recompile all your programs if you need to change
what kind of objects you can access or how they are accessed. With chared
libraries this only requires recompiling the libraries, but still isn't
something you'ld like to do every day. With the entire name space available
through the filesystem, you can change the entire hierarchy dynamically,
and starting the server for some kind of objects may as part of that same
operation establish the access path (just a file name) through which it is
accessed.
-- 
Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden
Phone: +46 19-13 03 60  !  e-mail: ske@pkmab.se
Fax:   +46 19-11 51 03  !  or ...!{uunet,mcsun}!sunic.sunet.se!kullmar!pkmab!ske

Volume-Number: Volume 21, Number 193

chip@tct.uucp (Chip Salzenberg) (10/09/90)

Submitted-by: chip@tct.uucp (Chip Salzenberg)

According to ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe):
>My program *can't* know what's available.  If someone comes along
>with a special "open hyperspace shunt" function; my program can't
>benefit from it.  If hyperspace shunts are in the global name space
>and posix_open() understands their name syntax, my program will work
>just fine.

Thank you, Richard, for stating well what I have intuitively felt.

(Dan, you wanted a reasoned rebuttal.  Very well: here it is.)

It is true that interactive use of UNIX, especially by programmers,
puts a lot of emphasis on the shell interface.  If such an environment
were all there were to Unix, then Dan's fd-centric view of the world
could possibly be useful.  To use Richard's example: when a hyperspace
shunt became available, its use would require only a change to the
shell source code and a recompilation.

However, the reality of modern Unix use is something else entirely:
pre-packaged utilities, usually available only as binaries, that for
practical purposes *cannot* be changed or replaced.  In this
environment, kernel features that require program customization are
unwieldy at best, useless at worst.  As long as shells fall into this
category -- "programs usually distributed as binaries" -- fd-centric
UNIX will never be practical.

One could argue that binary-only distribution is evil and should be
stopped.  I can agree that binaries are less useful than source code;
in fact, my personal motto is, "Unless you have source code, it isn't
software."  Nevertheless, copyright and trade secret law being what
they are, we will continue to see binary-only distributions for the
indefinite future.

Even if source code were for all UNIX programs were freely available,
I doubt that anyone would seriously propose modifying *all* of them
each time a new kind of fd-accessible object were added to the kernel.

Finally, filenames often are stored in places where no shell will ever
see them, such as program-specific configuration files.  So in Dan's
hypothetical fd-centric UNIX, we would have to either (1) pass such
filenames to the shell for interpretation, thus incurring a possibly
substantial performance hit; or (2) modify each program to understand
all the names the shell would understand.  In my opinion, neither of
these alternatives is viable.

To summarize:

A unified namespace has one great advantage: new types of objects are
immediately available to all programs -- even the programs for which
you do not have the means or the desire to modify and recompile.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
    "I've been cranky ever since my comp.unix.wizards was removed
         by that evil Chip Salzenberg."   -- John F. Haugh II


Volume-Number: Volume 21, Number 194

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/11/90)

Submitted-by: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)

I was not planning to post further on this topic, but Chip has provided
some good arguments that deserve a proper rebuttal.

In article <13392@cs.utexas.edu> chip@tct.uucp (Chip Salzenberg) writes:
> It is true that interactive use of UNIX, especially by programmers,
> puts a lot of emphasis on the shell interface.  If such an environment
> were all there were to Unix, then Dan's fd-centric view of the world
> could possibly be useful.

The success of UNIX has proven how useful this ``fd-centric'' view is.

> To use Richard's example: when a hyperspace
> shunt became available, its use would require only a change to the
> shell source code and a recompilation.

You are making an unwarranted assumption here: that the shell *has* to
handle all types of fd creation. It's convenient, of course, but by no
means necessary. My TCP connectors, for example, are implemented outside
the shell.

> However, the reality of modern Unix use is something else entirely:
> pre-packaged utilities, usually available only as binaries, that for
> practical purposes *cannot* be changed or replaced.  In this
> environment, kernel features that require program customization are
> unwieldy at best, useless at worst.  As long as shells fall into this
> category -- "programs usually distributed as binaries" -- fd-centric
> UNIX will never be practical.

This is also unfounded. My TCP connectors provide a counterexample to
your hypothesis (that the shell must handle everything and hence be
recompiled) and your conclusion (that fd-centric UNIX doesn't work).
Any programming problem can be solved by adding a level of indirection.

> One could argue that binary-only distribution is evil and should be
> stopped.

I do, in fact, think exactly that. But I will not use it as a basis for
my arguments.

> Finally, filenames often are stored in places where no shell will ever
> see them, such as program-specific configuration files.  So in Dan's
> hypothetical fd-centric UNIX, we would have to either (1) pass such
> filenames to the shell for interpretation, thus incurring a possibly
> substantial performance hit; or (2) modify each program to understand
> all the names the shell would understand.  In my opinion, neither of
> these alternatives is viable.

On the contrary. syslog is a counterexample. While it is hardly as
modular as I would like, it shows that (0) an fd-centric model works;
(1) you do not need to invoke the shell or any other process, and you do
not need to incur a performance hit; (2) you do not need to modify each
program to understand everything that the syslogd program can. syslog
has proven quite viable.

Provided that there is a message-passing facility available, and
provided that it has sufficient power to pass file descriptors (which is
true both under BSD's UNIX-domain sockets and under System V's streams),
the syslog model will generalize to any I/O mechanism without loss of
efficiency. open() can always be replaced by a write() to the facility
followed by a file descriptor transfer. This is just as easy to do
outside the kernel as inside the kernel; therefore it should be outside.

> To summarize:
> A unified namespace has one great advantage: new types of objects are
> immediately available to all programs -- even the programs for which
> you do not have the means or the desire to modify and recompile.

To summarize: I believe I've provided counterexamples to each of your
arguments and conclusions, and so I continue to maintain that a unified
namespace is pointless. There is no need to recompile any programs just
to provide a new I/O mechanism.

A unified namespace has several great disadvantages: 1. It provides a
competing abstraction with file descriptors, hence adding complexity to
the kernel, and giving vendors two different outlets for extensions.
This will result in a confused system, where some features are available
only under one abstraction or the other. 2. It is not clear that all
sensible I/O objects will fit into one namespace. If the precedent of a
unified namespace is established now, I/O objects that don't fit will be
much harder to add later. 3. A unified namespace has not been tested on
a large scale in the real world, and hence is an inappropriate object of
standardization at this time.

---Dan

Volume-Number: Volume 21, Number 195

peter@ficc.ferranti.com (Peter da Silva) (10/12/90)

Submitted-by: peter@ficc.ferranti.com (Peter da Silva)

In article <13441@cs.utexas.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> In article <13392@cs.utexas.edu> chip@tct.uucp (Chip Salzenberg) writes:
> > It is true that interactive use of UNIX, especially by programmers,
> > puts a lot of emphasis on the shell interface.  If such an environment
> > were all there were to Unix, then Dan's fd-centric view of the world
> > could possibly be useful.

> The success of UNIX has proven how useful this ``fd-centric'' view is.

Not at all. You can equally argue that it proves how useful the "unified
name space" view is, because *that* is another of the features that marks
UNIX as something new. Or that it proves the "filter" concept, or any of
the other things that *as a whole* go to making UNIX what it is.

UNIX is synergy.

> This is also unfounded. My TCP connectors provide a counterexample to
> your hypothesis (that the shell must handle everything and hence be
> recompiled) and your conclusion (that fd-centric UNIX doesn't work).
> Any programming problem can be solved by adding a level of indirection.

OK, how do you put your TCP connectors into /etc/inittab as terminal
types? Or into /usr/brnstnd/.mailrc as mailbox names? Or into any other
program that expects filenames in configuration scripts (remember, not
all scripts are shell scripts).

> A unified namespace has several great disadvantages: 1. It provides a
> competing abstraction with file descriptors,

No, it adds a complementary abstraction to file descriptors. In fact, a
unified name space and file descriptors together form an abstraction that
is at the heart of UNIX: everything is a file. A file has two states: passive,
as a file name; and active, as a file descriptor.

> This will result in a confused system, where some features are available
> only under one abstraction or the other.

Which is what you seem to be advocating.

> A unified namespace has not been tested on
> a large scale in the real world, and hence is an inappropriate object of
> standardization at this time.

I would like to suggest that UNIX itself proves the success of a unified
namespace.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

Volume-Number: Volume 21, Number 200

flee@guardian.cs.psu.edu (Felix Lee) (10/16/90)

Submitted-by: flee@guardian.cs.psu.edu (Felix Lee)


>On the contrary. syslog is a counterexample. While it is hardly as
>modular as I would like, it shows that (0) an fd-centric model works;

syslog shows the limitations of an fd-centric model.  B News, for
example, writes log entries in the files "log" and "errlog".  You
cannot redirect this into syslog without modifying code.

If syslog existed in the filesystem namespace, you might
	ln -s /syslog/news.info log
	ln -s /syslog/news.err errlog
or maybe even
	ln -s ~/mylog/news.err errlog
and everything would work.

Why should I have to teach all my programs about syslog when I can
just write to a filesystem object instead?
--
Felix Lee	flee@cs.psu.edu


Volume-Number: Volume 21, Number 204