[mod.protocols.tcp-ip] NFS could support general case pathnames pretty easily...

gnu@lll-crg.ARPA@hoptoad.UUCP (12/27/86)

There has been a migration from (mostly useful) criticism of Sun's NFS
to discussion and criticism of network file systems in general.  This is
OK, we should just not get confused about what we are refuting.

>     > As I see it, the problem is really the lack
>     > of support for anything but UN*X filesystem syntax in UN*X.  
> 
>     Since Unix filename syntax is a sequence of chars terminated
>     by a null (some systems have a maximum length, generally not
>     less than about 1024 bytes), its hard to see how this is much
>     of a problem.
> 
> Consider TOPS-20 directory structure, VM/CMS mini-disks, file system
> that permit "/" characters in their filenames, or especially file
> systems (like VMS and VM/CMS) that have structured (non-byte stream)
> files.  None of them map very quietly into a hierarchical set of
> directories separated by "/" characters, and there are more and harder
> where they came from.

Requiring Unix pathname syntax can be considered a fixable bug in NFS,
and (in an operating system with pretty flexible file names like Berkeley
Unix) need not be an issue in network file system access.

While I was at Sun, I suggested to the NFS group that the basic file
name lookup operations should be passed the entire file name and should
return "whatever part they had not handled".  This would allow the
system which actually implements the disk file structure to parse the
names involved.  For example, on Unix you could do:

	% cd /vm.cms
	% more 'humm zumm a'
	% cd /dg.aos
	% mail <:udd:toad:music:spheres tcp-ip@sri-nic.arpa
	% cd /rsx11m
	% grep meaning '[34,5678]mans.src;4'
or even
	% grep honestman '/vm.cms/whole world a' '/rsx11m/[1,2]buckle.shu'
and presumably if you wanted to spend a lot of time in e.g. a CMS file system,
you could write a "cms shell" that would parse your commands the way the CMS
interpreter does, rather than the way that's convenient for Unix.

However, this idea did not reach the NFS group until they were too far
along to consider it for the first release.  I still think it would be
a good idea, and it could be done by adding another remote procedure to
call with these semantics.  If an NFS client implementation tried this
call and it failed, it could remember that fact, never try it again in
that filesystem, and revert to the old way (parse up to the next "/"
character and call the old "look up name" procedure), providing
backward compatability.

There would still be a few loose ends, e.g. a leading slash (or some
other local convention on non-Unix systems) would still indicate global
rather than current-directory file naming, making it hard to get at
remote files in your current directory that begin with "/".  Also,
programs written for a particular operating system would make
assumptions about how to reach a parent directory, or what characters
are valid in file names, or how to turn "buggy.c" into "buggy.o" or
"hairy.txt" into "hairy.tqt" which would often prevent the use of
arbitrary remote files by those commands.  But no network file system
will solve that problem by itself.

>                                               mapping everything to UN*X
> syntax is a lot easier on UN*X applications than changing them all to
> handle a pathname representation designed to facilitate operations in a
> heterogeneous environment.  As it happens, the Symbolics environment
> represents pathnames and file system operations in a way that is
> optimized to heterogeneous environments.

I think Berkeley did a good enough job, especially given what they
started with and what System V still labors under.

If the filename parsing bugs in Sun's NFS are fixed, what will start
showing up are these application program bugs.  While Lisp is a nice
language, I don't think we should rewrite the world in it to solve our
pathname representation problems.  Better to define some standard hooks
and encourage their availability in a wide variety of environments,
e.g. dir_part(filename) would give the directory name;
file_part(filename) would give the filename within the directory;
parent_of_dir(dirname); file_in_dir(file, dir) would concatenate the
name of the dir and file; dir_in_dir(dir1, dir2) would concatenate two
directory names; root_dir() return the root of the naming scheme, if
any; etc.  I suspect that a set of 10 or 15 of these at most would be
enough to provide a 99% solution to writing applications that don't
embed file-naming-specific information, given an operating system that
supports hierarchy.

Some of these would be tricky, e.g. MSDOS does not implement a true
hierarchy, but has a separate root on each device (e.g. "a:foo" gets foo
in the current directory on disk a, while "a:/foo" gets foo in the root
directory on a, and "/foo" gets foo in the root on the current
disk).  My impression is that VMS is similar, though I've never worked
with it.

Of course, in a system that implemented NFS, these routines would have
to RPC to the remote system at run time to determine its conventions,
so they might not be cheap, but they would be correct.

Margulies@SAPSUCKER.SCRC.SYMBOLICS.COM.UUCP (12/27/86)

    Date: Sat, 27 Dec 86 00:10:59 PST
    From: hoptoad!gnu@lll-crg.ARPA (John Gilmore)

    There has been a migration from (mostly useful) criticism of Sun's NFS
    to discussion and criticism of network file systems in general.  This is
    OK, we should just not get confused about what we are refuting.

First, I was refuting the claim that Symbolics is taking a partisan
position.

Second, I was refuting the claim that just passing string around in an
operating system is the best scheme for handling heterogeneous
pathnames.

Third, I was pointing out that there are two very different purposes for
a network file system under discussion: allowing other computers to
serve as an extension to "your" file system, and allowing access to
other computers' file systems.  

In the first case, which appears to be represented by the existing NFS
implementation, its okay that different hosts disagree on the names of
files, and that file access is constrained by the model of the local
host.   (an aside: I still have trouble seeing why anyone would want to
try to get work done in an environment with 200 workstations, few of
them with significant local disk storage, and many of them disagreeing
on the names of files on the shared resources. If they are all UN*X
boxes, then careful administration can enforce agreement, but still,
accidents will happen.)

In the second case, it just won't do to try to give all files on any
host the same appearance.  The principle should be to allow uniform
access to common capabilities, but also transparent access to particular
ones.

As a standard for allowing any host to supply some file system for
UN*X'es on the network, NFS is fine.  As a standard for allowing
heterogeneous computers to access each other's files as \peers/, the NFS
protocol may still prove fine. The NFS UN*X interface seems to be a
problem, and that's that this discussion is turning toward.

    >     > As I see it, the problem is really the lack
    >     > of support for anything but UN*X filesystem syntax in UN*X.  
    > 
    >     Since Unix filename syntax is a sequence of chars terminated
    >     by a null (some systems have a maximum length, generally not
    >     less than about 1024 bytes), its hard to see how this is much
    >     of a problem.
    > 
    > Consider TOPS-20 directory structure, VM/CMS mini-disks, file system
    > that permit "/" characters in their filenames, or especially file
    > systems (like VMS and VM/CMS) that have structured (non-byte stream)
    > files.  None of them map very quietly into a hierarchical set of
    > directories separated by "/" characters, and there are more and harder
    > where they came from.

    Requiring Unix pathname syntax can be considered a fixable bug in NFS,
    and (in an operating system with pretty flexible file names like Berkeley
    Unix) need not be an issue in network file system access.

My understanding is that the \protocol/ has no pathname syntax
requirements, only the \interface/ through the UN*X file system, which
is not the same thing.  The NFS product may include both, but from the
point of view of us non-UN*X would-be protocol implementors, they are
very different.  That is what Steve Sneddon started out trying to say
when a brick was thrown through his console.

    >                                               mapping everything to UN*X
    > syntax is a lot easier on UN*X applications than changing them all to
    > handle a pathname representation designed to facilitate operations in a
    > heterogeneous environment.  As it happens, the Symbolics environment
    > represents pathnames and file system operations in a way that is
    > optimized to heterogeneous environments.

    I think Berkeley did a good enough job, especially given what they
    started with and what System V still labors under.

In your next paragraph you begin to hit some of the problems with
pathnames represented as a string as opposed to a data structure. The
issue is not C versus Lisp (versus Forth?), it is data structure.

Here is a vaguely language-independent shot at describing Symbolics'
object oriented pathname representation.

You can extract the following fields from any pathname:

    host         (another complex object)
    device       a string or null
    directory	 a list of strings
    name	 a string
    type	 a string or null
    version	 a number, or a keyword for newest, oldest, or the like,
		 or null

In theory, a pathname could have other, host-specific fields.

Any field can have a wild-card.

You can ask for a new pathname based on an old pathname by supplying any
or all of the fields.  

You can do wild-card matching of any pathname against any other.

Note that the particular per-host string format and delimiters are
irrelevant to programs that want to process pathnames.  Only the
pathname system has to be able to parse strings.  The standard parser is
used by all user interfaces to pathnames.

You can ask for a string -- the format of the string depends on the type
of host. 

When you open, the host conditions much of what happens.

We have found this basic scheme (with a lot of complicated details in
the implementation) sufficient to cover all the hosts we have ever
talked to, and that's a lot.  I see no reason other than compatibility
why such a thing couldn't exist on UN*X.  Mind you, I am NOT saying 'you
UN*X hackers should get on the ball and implement this.'  My entire goal
is to point out that we (probably amongst others) have confronted this
issue and have a working implementation, and you might want to check it
out before trying another.  

    If the filename parsing bugs in Sun's NFS are fixed, what will start
    showing up are these application program bugs.  While Lisp is a nice
    language, I don't think we should rewrite the world in it to solve our
    pathname representation problems.

As above, neither do I.  Object oriented programming sure is convienient
for this problem, but certainly not necessary.

    Better to define some standard hooks
    and encourage their availability in a wide variety of environments,
    e.g. dir_part(filename) would give the directory name;
    file_part(filename) would give the filename within the directory;
    parent_of_dir(dirname); file_in_dir(file, dir) would concatenate the
    name of the dir and file; dir_in_dir(dir1, dir2) would concatenate two
    directory names; root_dir() return the root of the naming scheme, if
    any; etc.  I suspect that a set of 10 or 15 of these at most would be
    enough to provide a 99% solution to writing applications that don't
    embed file-naming-specific information, given an operating system that
    supports hierarchy.

    Some of these would be tricky, e.g. MSDOS does not implement a true
    hierarchy, but has a separate root on each device (e.g. "a:foo" gets foo
    in the current directory on disk a, while "a:/foo" gets foo in the root
    directory on a, and "/foo" gets foo in the root on the current
    disk).  My impression is that VMS is similar, though I've never worked
    with it.

    Of course, in a system that implemented NFS, these routines would have
    to RPC to the remote system at run time to determine its conventions,
    so they might not be cheap, but they would be correct.

We solve that problem by expecting the network name server to reveal the
type of host.

One more point:  we also found it convienient to have the equivalent of
the mount table scheme for some purposes.  There is a thing called a
"logical host." A logical host can be mapped to pieces of the hierarchy
of any number of real hosts.  It has a uniform syntax, and thus hides
some functionality of the underlying file systems.  It is useful when
you want to be able to assume the presence of some files at any site.
We use it to locate system files.