[mod.protocols.tcp-ip] NFS

bzs@BU-CS.BU.EDU.UUCP (12/20/86)

It seems unfair to cast aspersions at those who have pioneered Network
File Systems as if their implementations were somehow finished or
immutable. Praise should be given to how far the publication of their
efforts has brought us in thinking about the issues (and the
credibility that they are worth thinking about.)

I can think of another major networking protocol which prides itself
on having been put into practice early in its design cycle and
corrected where need be (sometimes radically) based upon concrete use
rather than paper committee meetings. The name escapes me however.

Issues like "file organization" between heterogeneous systems have
been raised for years. I know of no protocol which attempts to solve
this in general (although a few special cases -do- go a long way.)

The last time someone raised this issue in my office I asked him if
this problem had been solved for magnetic tapes yet on his system?
If so, I proposed that I could adapt that solution to FTP (the case
in point at the time) easily enough. Needless to say he walked away
in a huff.

Some of these issues are HARD, very hard! I wouldn't go so far as to
say insoluble (I mean, people do seem to solve them manually) but I
think this difficulty should be considered before saying that XYZ does
not solve this. Proposals for solutions would be most welcome.

I think the problem is that people either want perfect and general
solutions or they throw up their hands entirely. My suspicion is that
the best solution will be the ability to code modules at an
application level to handle the various permutations of file access
methods between systems and let the libraries blossom out of the user
community. Extensibility seems to be the key need here. And practice.

As a more concrete example, why shouldn't FTP allow me to specify
input and output filter programs on both ends, provided as a library
by the systems? The same sort of thing should work for Network File
Systems, although the ability to type files and have these "daemons"
invoked automatically would probably be the right approach. Given a
few years of that I suspect the "standards" would begin to reveal
themselves.

	-Barry Shein, Boston University

bzs@BU-CS.BU.EDU (Barry Shein) (12/23/86)

I think there is a misconception brewing here about UNIX file
semantics. It is true that the low level UNIX system calls (eg. OPEN,
READ, WRITE, LSEEK) impose no structure on a file except as a stream
of bytes. This is not peculiar to UNIX, most any O/S that I know of
has some way to just get the bytes off the disk although systems which
prefer structured/typed files tend to resist that and lead the user
towards an access method. Of course, a processor with a sophisticated
IOP (eg. a data base back-end) might be an exception to this rule but
I believe such situations are beyond the scope of this discussion.

Any access method could be layered on top of the UNIX low level calls,
and many have been. Surely I could write DEC's RMS or IBM's access
methods in terms of these simple calls.

As a specific example, consider the UNIX DBM calls which stores
arbitrary data as hashed key/value pairs. This presents the same
problem as any more structured system (eg. how would I fetch the next
key/value pair out of such a file from a remote, non-UNIX system? No
different really than fetching the next ISAM record etc.)

This is somewhat in response to Geoff's note (which was a very good
direction for thought.) I am only saying that the problem is entirely
symmetrical, there is no magic property of access methods whether
built into an O/S or supplied as applications libraries, bytes is
bytes.  The only possible difference is that a system that provides
many access methods might be able to make a list quickly of access
methods which users are probably using (give or take how the users
employed the various options such as record-size, blocking,
bucket-size etc etc.)

I only bring this up so that we don't wring our hands over what I
believe to be a common misconception. Any O/S could (I presume)
present their files as a stream of bytes, the problems would then
be symmetrical.

There are some differences, such as guaranteed atomicity of updates
and types of failure (eg. how extents are handled) but I don't believe
this level of detail is yet where this discussion has found itself
and, I suspect, would be solvable within any scheme that solves the
other, more salient problems.

However, unlike Geoff, I am more pessimistic IN THE GENERAL CASE.

If two systems have a !very! similar access method, such as an ISAM
implementation, then writing an interface between the two should be
relatively straight-forward (although it is still fraught with danger,
eg IBM's V-record format uses 16-bits to express lengths, another
system may not use 16-bits although it supports a V-record format, how
compatible could you make those two access methods?) In the case where
the access method doesn't exist at all I can't see how it could be
utilized at all (oh, I suppose a V-record could be returned to a
text-oriented application as "string<CR><LF>" but that sort of thing
is limited as a solution.)

I won't even mention the Fortran programmer who would like to access
a file full of 128-bit binary floating point values via this NFS (no,
XDR doesn't work unless someone knows it's time to employ it, it still
may not work, does your machine have 128-bit floats?)

I don't think it's insoluble, but I do suspect we will have to be
prescriptive (rather than descriptive) to provide a standard. Given
a standardized menu of network access methods could *you* do your
work?

	-Barry Shein, Boston University

bzs@BU-CS.BU.EDU.UUCP (12/26/86)

I am surprised that a lot of this discussion has centered around
pathname'ing. It always seemed to me to be one of the easier things
to either fake or punt (fake: use UNIX syntax on a UNIX workstation
as NFS does, punt: use a quoted syntax such as the PUP/Leaf's convention
of {HOST_OR_DEVICE}<any_string_the_other_os_can_interpret>.)

Of course, there is always the possibility of coming up with a
standard, universal catalogue syntax, similar in spirit I guess to the
Library of Congress' universal conventions for finding something.
Then we could all either use that syntax or at least support it.

I always thought it was file formats (access methods) that were the
problem (or have we decided that this is too hopeless to even think
about?)

Maybe we need to make a list of issues, here's mine:

        1. File naming.
        2. Path naming.
        3. File formats and access methods (eg. ISAM, stream...)
        4. File access semantics (eg. atomicity of updates,
           error handling, authorization, etc etc etc.)
       (5. Performance?)

        -Barry Shein, Boston University