[comp.protocols.nfs] NeFS

bruce@basser.cs.su.oz.au (06/29/90)

Just a brief interruption to the PC-NFS exchanges ..

NFS is a widely used network file system protocol.  Perhaps it is
most widely used between unix systems.  NFS is also used between
PC's and unix systems.  Let's leave aside its application to PC's
and consider only its use between unix systems.

Under unix, an NFS client process waits while its system call
is translated into 1 or more remote procedure calls (RPC's)
which are then executed on an NFS server.  Once remote execution
has completed the results are returned to the waiting process
which is then allowed to continue.

Here are some example translations (where "/n" is an
NFS mount point and the fh's are file handles):

System Call			Generated RPC(s)
-----------			----------------
stat("/n", &statb);		getattr(fh0)

stat("/n/etc", &statb);		getattr(fh0)
				lookup(fh0, "etc")

stat("/n/etc/rc0.d", &statb);	getattr(fh0)
				lookup(fh0, "etc")
				lookup(fh1, "rc0.d")

stat("/n/etc/rc0.d/a", &statb);	getattr(fh0)
				lookup(fh0, "etc")
				lookup(fh1, "rc0.d")
				lookup(fh2, "a")

chmod("/n/etc/rc0.d/a", mode);	getattr(fh0)
				lookup(fh0, "etc")
				lookup(fh1, "rc0.d")
				lookup(fh2, "a")
				setattr(fh3, .., mode, ..)

Each RPC consists of both a call and a reply message.
Thus 10 messages are transmitted on the network in
the "chmod()" example above.
With a conceptually simple change to the NFS protocol it might
be possible to reduce this number from 10 to 2 by
providing a new RPC which expected a file handle and a
list of pathname components as its initial arguments (rather
than just a file handle):

	nsetattr(fh0, "etc", "rc0.d", "a", 0, .., mode, ..)

Continuing with this approach we can arrive at a revised NFS protocol
which sends exactly one RPC for each client process system call.
However, we can't do better than this as long as we continue
to use the standard unix system call interface.
For example, the following pseudo-code sequence (which
might be from cpio, tar or restore):

	fd = creat("/n/a", mode);
	write(fd, "A", 1);
	close(fd);
	chown("/n/a", uid, gid);
	utime("/n/a", tp);

must generate at least 5 x 2 = 10 network messages.
There may well be an RPC which, in one operation, creates a file,
writes a byte to it and sets its uid, gid, last access time
and last modified time, but such an RPC would not be
reachable under a normal unix system.

In light of this limitation it is interesting to note that
Sun are proposing a new (postscript-based) network file
system protocol -- NeFS.
Postscript programs are written on the client, downloaded
to the server which then executes them returning results as desired.

Previous articles in this group have asked the question
"Who or what will write the postscript code?".
I believe that one proposed answer was along the lines of
"Not the applications programmer but the system programmer who
writes the kernel client code, and this will only need to be
written once for each operating system port.".
The idea implicit in this answer is that there will be some
simple mapping from each client system call to a corresponding
NeFS program template.
Other posters have pointed to the fact that RPC-based protocols
are seriously constrained by physical limits -- transcontinental
delays are of the order of milliseconds.
The NeFS approach is offered as a way around this problem.

In the appendix to Sun's Draft NeFS document ("The Network Extensible
File System Protocol Specification"), there are 3 examples given.
The second ("Read a directory") could be derived fairly directly from
the "getdents()" system call.  However, neither of the other two
("Determine disk usage in bytes" and "Copy a file")
can be generated by any single existing unix system call.

The extreme generality of NeFS and the above two examples suggest
that Sun is proposing not only a new underlying network file system
protocol but also a new unix system call interface.

I have long suspected with regard to NFS that Sun left behind some
of unix file system semantics so as to be able to support a wider
range of client and server platforms (e.g. PC's).  Now with the advent of
NeFS it seems that they may be moving further in this direction.

Perhaps some readers from Sun or elsewhere would like to comment?

Bruce Janson
Basser Department of Computer Science
University of Sydney
Sydney, N.S.W., 2006
AUSTRALIA

Internet:	bruce@basser.cs.su.oz.au
Telephone:	+61-2-692-3264
Fax:		+61-2-692-3838

.. now, back to PC-NFS.

mmeyer@next.com (Morris Meyer) (07/01/90)

In article <1075@cluster.cs.su.oz> bruce@basser.cs.su.oz.au writes:
[ ... text deleted ]
>
>I have long suspected with regard to NFS that Sun left behind some
>of unix file system semantics so as to be able to support a wider
>range of client and server platforms (e.g. PC's).  Now with the advent of
>NeFS it seems that they may be moving further in this direction.
>
>Perhaps some readers from Sun or elsewhere would like to comment?
>
>Bruce Janson

At NeXT, we have found that the traditional UNIX file system semantics
restrict the performance of our file-system based user-interface
components (open panel, save panel and workspace).  Each component is
a column oriented browser with scrollers, and with file pathnames
separated into columns.  Next to each filename is an optional carrot
'>' which designates whether or not the pathname component is a
directory or not.

When a user clicks on a directory, the browser must first do a
readdir, followed by stats on all of the direntries to tell whether or
not the directory entry needs a carrot next to it.  Your user
interface can vary trmendously depending on whether you are on a local
or a remote filesystem. With NeFS, this could be one client NeFS
operation that was loaded onto a server.

When our workspace is displaying icons for filenames, it has to
generate them based upon file extension and whether or not the file is
executable.  Programs "register" their icons with the workspace by
indicating which file extensions that they can handle.  An icon view
is expensive and inelegant with current Unix filesystem semantics.
With NeFS one could download directory semantics that have an icon, a
registered program, etc inside the actual directory.

		--morris

		morris meyer (mmeyer@next.com)
		software engineer
		NeXT OS Group

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (07/02/90)

In article <1075@cluster.cs.su.oz>, bruce@basser.cs.su.oz.au writes:
...
> 
> chmod("/n/etc/rc0.d/a", mode);	getattr(fh0)
> 				lookup(fh0, "etc")
> 				lookup(fh1, "rc0.d")
> 				lookup(fh2, "a")
> 				setattr(fh3, .., mode, ..)
> Each RPC consists of both a call and a reply message.
> Thus 10 messages are transmitted on the network in
> the "chmod()" example above.
...

If the client does any caching, as would expected if the client NFS
implementation were at all related to a disk file system implementation,
then you would not expect 10 RPC's to be needed for the vast majority of
all such system calls.  For one thing, one (either a person or a program)
rarely fiddles with /a/b/c/d/e/foo without also fidding with /a/b/c/d/e/bar
at about the same time, so that the full attributes including
rnode/vnode/whatever are likely to be present when you use a long pathname.

A quick look at `nfsstat` on machines around here validates that
intuition.  (NFS is used quite heavily for Silicon Graphics software
development, as well as for usenet-news and other stuff).  I see far more
getattr()'s, both client and server, than lookup()'s.  One often sees more
getattr()'s than read()'s, but I suspect that is a result of the familiar
3/30 second cache timeouts.

The RISC trick of making the common stuff lean, mean and fast, and the
uncommon stuff only functional should be used in file system design.
One should have only those remove file system operations that are
either absolutely primitive or sufficently common.

Vernon Schryver
vjs@sgi.com

brent@terra.Eng.Sun.COM (Brent Callaghan) (07/03/90)

In article <1075@cluster.cs.su.oz>, bruce@basser.cs.su.oz.au writes:
> Each RPC consists of both a call and a reply message.
> Thus 10 messages are transmitted on the network in
> the "chmod()" example above.
> With a conceptually simple change to the NFS protocol it might
> be possible to reduce this number from 10 to 2 by
> providing a new RPC which expected a file handle and a
> list of pathname components as its initial arguments (rather
> than just a file handle):
> 
> 	nsetattr(fh0, "etc", "rc0.d", "a", 0, .., mode, ..)
> 

Actually things aren't quite that bad with most Unix client
implementations.  The client's directory name lookup cache
(dnlc) will cache the intermediate directory name translations.
If it's the first reference to a path you will get stuck
with having to do separate over-the-wire lookup operations
per pathname component.  NFS gives you no choice here.

We have messed with the idea of having a generalized NFS LOOKUP
operation that would take a vector of names - components in a
pathname.  This comes under the heading of "Multi Component Lookup"
(MCL).  This is achievable via the current vnode interface - no new
system calls are required - our implementation of RFS under vnodes
already uses MCL.  HOWEVER - here's the crunch: in typical
filesystem traffic MCL is hardly ever useful (my observation).  The
dnlc does a pretty good job of avoiding long strings of client
pathname lookups.  Adding MCP to a protocol rev of NFS would be
"nice" but it comes at a cost of increasing the complexity of the
protocol for little added benefit - the RISC vs CISC that Vernon
Schryver referred to.  Herein is the advantage of a protocol like
NeFS:  it doesn't *explicitly* support MC but you can easily
*express* MCL in it.

A better example perhaps is something akin to Morris Meyer's. 
I have "ls" aliased to "ls -F".  This appends a character to
names in the listing to indicate their type e.g. "/" means 
directory, "*" means executable, "@" means symlink.  These are
derived from the file attributes.  File attributes are not
returned by NFS's READDIR.  The client is forced to LOOKUP
every name in the directory to get its attributes e.g. in
my home directory with ~80 names I generate 80 over-the-wire
LOOKUP's to the server.  In large directories it's noticeable
faster just to "/bin/ls" and skip the NFS overhead imposed by
generating the funny little characters.

It gets worse if you use a window-based file manager as used
in MacOs, MS Windows, NeXT & OpenWindows.  It's common to
pop open a folder and view the files as labelled icons.  The
labelled icons are derived either by inspection of file attributes
or by reading some file data.  Here interactive response can
suffer noticeably if access is through the NFS protocol

> In the appendix to Sun's Draft NeFS document ("The Network Extensible
> File System Protocol Specification"), there are 3 examples given.
> The second ("Read a directory") could be derived fairly directly from
> the "getdents()" system call.  However, neither of the other two
> ("Determine disk usage in bytes" and "Copy a file")
> can be generated by any single existing unix system call.

Yes, NeFS sets out to be a distributed filesystem protocol - not
a specifically a Unix protocol.  The examples are intended to
demonstrate the power of the protocol - not what's narrowly expressible
via the current vnode interface.

> The extreme generality of NeFS and the above two examples suggest
> that Sun is proposing not only a new underlying network file system
> protocol but also a new unix system call interface.

It will take years for any new distributed filesystem protocol to
become as ubiquitous as NFS.  Nobody can predict the changes to
interfaces that utilize the protocol - Unix system calls and the
vnode interface are just one example.  Assuming that such a protocol
becomes generally accepted, it should be able to evolve with changes
in OS and filesystem design.

We're not proposing a new system call interface, just a protocol that
won't stand as a barrier to the network extension of changes to
the filesystem interface - no matter who implements it.
--

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051

stukenborg@egghead.rtp.dg.com (Stephen Stukenborg) (07/03/90)

In article <1075@cluster.cs.su.oz> bruce@basser.cs.su.oz.au writes:
>In light of this limitation it is interesting to note that
>Sun are proposing a new (postscript-based) network file
>system protocol -- NeFS.
>Postscript programs are written on the client, downloaded
>to the server which then executes them returning results as desired.

It seems that in the first go around, most comments centered around
whether or not postscript was a "good" language for implementing the
protocol primitives.  Little attention was given to other aspects
of the NeFS protocol.  I would like to throw out a few comments on the
paper, mostly because I'm sick of wading through the PC-NFS traffic.

The first section of the paper calls out six problem areas of the 
version 2 NFS protocol, and claims that NeFS addresses them.  There
are still some large pieces missing from the protocol before it can
claim to be operating system independent.

1. Size limitations.

Yes, NeFS addresses this by increasing the size of some structures.
I assume any new protocol will fix these limitations.

2. Non-idempotent procedures.

Well, like the rev. 2 protocol, NeFS still assumes a stateless server 
implementation.  Section 2.3 states "The NeFS protocol does not 
guarantee idempotent operations per se, but it does allow requests to 
be assembled that can detect retries of themselves ..."  Maybe I'm
missing something, but I don't see which NeFS operations aid 
retransmission detection.  
How can I detect retransmissions unless there is a unique
identifier in the request, and I keep a cache of duplicates (section
3.3.3).  What is in the NeFS protocol that solves this problem?  
Nothing that I can see.  (The "valid" operator?)

Let's take a simple protocol failing from rev. 2, "exclusive create".  
How do I build an exclusive create that detects retransmissions?
In other words, how do I detect the situation where the client transmits 
the request, server does exclusive create, server sends reply "success, 
client never gets the reply, times out and retransmits the request.
My exclusive create should return "success".  Without a duplicate
request cache on the server, it will return "failed - file exists".

I guess they have fixed the "remove" idempotent operations by saying 
that remove succeeds if the entry does not exist.  But what about rename?
The NeFS protocol does little to address idempotency.

3. Unix bias

Well, the draft I have (2/12/90) has several places with Unix bias.  

Look at the definition of file attributes (4.2.3).  Those
permission bits are Unix.  What about access control lists?  Should
file permission be dependent on the "ftype" field?

The "link" operator is called out in the document as being
very Unix-specific.  I agree with the statement that the "operator
list should be confined to only those operators that all
implementations can reasonable expect to implement."   So you 
define a file type "directory", and define operators that deal 
with directories (e.g. add entry, delete entry, lookup, etc.).
Having OS-specific operators is the wrong answer if you are trying 
define an operating system independent protocol.

Although it is really an RPC issure, authentication is something that is 
given short shrift in the document.  Section 3.3.4 says that "authentication 
information in the environment is conveyed by the interpreter to the 
NeFS operators to be used in checking access permissions for filesystem 
objects."  I hope they're not assuming AUTH_UNIX authentication like
the version 2 protocol did.

4. No access procedure.

I assume this makes overt the covert permission hacks in the rev. 2 
protocol. (e.g. the server allows the owner of a file to read/write it
no matter what the file permissions are set to.)

5. No facility to support atomic filesystem operations.

As it stands now, the "lock" operator needs work.  "Beware of 
deadlocks" when nesting locks is the wrong answer. Nor should
it be an error to nest locks.  The "lock" operator should do deadlock 
detection.  

What does the spec mean in the IMPLEMENTATION section when it states 
"mandatory locking is required for all NeFS clients"?  That each
filesystem operator is going to get a lock?  (e.g. the write 
operator is going to get an exclusive lock on the file before it writes 
the data?)  Locks are mandatory only if everyone is forced to use
them.

6. Performance.

You can definitly see where defining NeFS "programs" that can do several
serveral remote filesystem operations can be a big win.  This seems to be 
the real strength of an extensible protocol.

An extensible protocol is an innovative idea for the next version of
NFS.  I just think there needs to be more discussion of the "real"
protocol issues, rather than whether or not postscript is the proper
language to implement an extensible protocol.

Steve Stukenborg
Data General Corporation
62 Alexander Drive			stukenborg@dg-rtp.dg.com
Research Triangle Park, NC  27709	...!mcnc!rti!xyzzy!stukenborg

gray@Neon.Stanford.EDU (Cary G. Gray) (07/04/90)

Brent Callaghan's latest message points out clearly what I judge
to be a serious flaw in the NeFS spec:  the absence of support for
coherent (aka consistent) caching.  He gives two examples for which
caching on clients is an excellent fit--and for which, therefore, NFS
and NeFS are poorly suited.

First, in deprecating multiple-component name lookup, he notes that
the client "directory name lookup cache" is most Unix implementations
eliminates most over-the-wire name lookups.  Fine.  But it isn't
coherent, so it's going to surprise me someday when I try to use two
hosts.

The second example is the 'ls -F'.  Here the attributes need to be
cached as well.  The same applies to the example of needing attributes
to support an interface with icons.

Caching is not a panacea, but neither is coherent caching impossible--or
even impractical.  Interested folks should look at two papers in the
proceedings of last December's SOSP (published as Operating Systems 
Review 23(5)).  First, Srinivasan and Mogul ("Spritely NFS") add support
for coherence of file contents to NFS, reaping a significant improvement
in performance.  Their approach does require additional server state; for
a way to make that state "soft"--i.e., to allow recovery when it is lost--
see the paper by Gray and Cheriton on "Leases".  The ideal would combine
the two ideas and add coherent caching of naming and attribute information.
It's not trivial to do so, but it is largely straightforward.

	Cary Gray
	gray@cs.stanford.edu

brent@terra.Eng.Sun.COM (Brent Callaghan) (07/04/90)

In article <1990Jul3.172516.16691@Neon.Stanford.EDU>, gray@Neon.Stanford.EDU (Cary G. Gray) writes:
> Brent Callaghan's latest message points out clearly what I judge
> to be a serious flaw in the NeFS spec:  the absence of support for
> coherent (aka consistent) caching.

It's true, the current spec makes no mention of explicit support for
client cache consistency.  This is just a first draft of the protocol
spec - a request for comments.  There's definitely room for a client
cache consistency model in NeFS - perhaps more than one.  It's just
not clear at this stage what's appropriate.  A cache consistency 
model that assumes a stateful server is OK provided the state is
easy to recover and doesn't become a burden as the number of clients
scales.

> First, in deprecating multiple-component name lookup, he notes that
> the client "directory name lookup cache" is most Unix implementations
> eliminates most over-the-wire name lookups.  Fine.  But it isn't
> coherent, so it's going to surprise me someday when I try to use two
> hosts.

The original discussion was over a DFS protocol that allowed "batching"
of filesystem requests.  My contention was that MCL is OK but don't
expect to see a noticeable improvement in performance if you are
using a DNLC.  The lack of complete cache consistency of a DNLC over
an NFS filesystem is not something I take issue with.

> The second example is the 'ls -F'.  Here the attributes need to be
> cached as well.  The same applies to the example of needing attributes
> to support an interface with icons.

Again, you missed the point of the discussion: no matter what kind
of caching you employ, the first time you "ls -F" in a filesystem
you'll have to wait while the underlying DFS handles all the
separate over-the-wire attribute requests. Caching isn't going
to make your folder full of icons open any faster.

> Caching is not a panacea, but neither is coherent caching impossible--or
> even impractical.  Interested folks should look at two papers in the
> proceedings of last December's SOSP (published as Operating Systems 
> Review 23(5)).  First, Srinivasan and Mogul ("Spritely NFS") add support
> for coherence of file contents to NFS, reaping a significant improvement
> in performance.  Their approach does require additional server state; for
> a way to make that state "soft"--i.e., to allow recovery when it is lost--
> see the paper by Gray and Cheriton on "Leases".  The ideal would combine
> the two ideas and add coherent caching of naming and attribute information.
> It's not trivial to do so, but it is largely straightforward.

Spritely NFS *does* require additional server state - for each active
file is has to keep a record of which clients have it open and what
they're doing with it.

I agree that coherent caching is not only desirable but that it can
also significantly improve performance.  If it means sacrificing
server statelessness then so be it - just make sure that the state
is scalable and easy to recover.

--

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051

stukenborg@egghead.rtp.dg.com (Stephen Stukenborg) (07/05/90)

In article <1990Jul3.172516.16691@Neon.Stanford.EDU> gray@Neon.Stanford.EDU (Cary G. Gray) writes:
>Brent Callaghan's latest message points out clearly what I judge
>to be a serious flaw in the NeFS spec:  the absence of support for
>coherent (aka consistent) caching.  He gives two examples for which
>caching on clients is an excellent fit--and for which, therefore, NFS
>and NeFS are poorly suited.
>...
>Caching is not a panacea, but neither is coherent caching impossible--or
>even impractical.  Interested folks should look at two papers in the
>proceedings of last December's SOSP (published as Operating Systems 
>Review 23(5)).  First, Srinivasan and Mogul ("Spritely NFS") add support
>for coherence of file contents to NFS, reaping a significant improvement
>in performance.  

I don't know if I would call the performance improvement "significant".
Yes, the Andrew benchmarks ran 15-20% faster using Spritely NFS (SNFS) vs.
vanilla NFS.  But they call out in the paper that they used
4K block sizes rather than 8K blocks, which was to SNFS's advantage
(NFS was forced to do more write calls).
More importently, they also noted that their NFS implementation
had a bug where the client data cache is invalidated when a file is
closed, preventing the client from using its cached copy (NFS is forced
to do more read calls).  Also, SNFS uses a delayed-write policy
(they sync their buffers after aging them 30 seconds), where NFS syncs a file's
buffers on close.  This is to SNFS's advantage, in that SNFS is not
"paying" for the writes as part of the benchmark timing, where NFS has to pay.
The tradeoff is robustness.  The SNFS policy may be great for temporary 
file generation (which the Andrew benchmarks do a lot of), but it 
is certainly less robust than the NFS policy.  Would users be satisified
with this tradeoff?  Probably.  (They already accept it in the local
file system case.)  SNFS also is likely to lose some performance when
(and if) they implement crash recovery.

I'm not saying that cache consistancy is bad, just that you have to
take the SNFS numbers with a grain of salt.  I believe that it's a win 
to show that you can have cache consistancy and still have performance
comparible  to NFS.  I also believe that the "next" NFS has to have
cache consistancy.  People aren't going to put up with this "stateless"
junk forever.

>Their approach does require additional server state; for
>a way to make that state "soft"--i.e., to allow recovery when it is lost--
>see the paper by Gray and Cheriton on "Leases".  The ideal would combine
>the two ideas and add coherent caching of naming and attribute information.
>It's not trivial to do so, but it is largely straightforward.

It's an interesting paper.  I guess the only drawback is that they 
haven't applied it to a "real" system to verify their analysis.

The real drawback to Sprite and Spritely NFS is that crash recovery is 
not (currently) implemented.  The Sprite paper even says that clients
who are using files from a crashed server have to be killed.  Somehow
I think that customers would be more than a little miffed over that
sort of "crash recovery" scheme.  The "leases" paper looks like a
good compromise between statelessness and cache consistancy.

Steve Stukenborg
Data General Corporation
62 Alexander Drive			stukenborg@dg-rtp.dg.com
Research Triangle Park, NC  27709	...!mcnc!rti!xyzzy!stukenborg

mogul@wrl.dec.com (Jeffrey Mogul) (07/07/90)

In article <1990Jul4.174509.10106@dg-rtp.dg.com> stukenborg@egghead.dg.com () writes:
>I don't know if I would call the performance improvement "significant".

In spite of the various qualifications pointed out in your message,
I still think the results are "significant" in the sense that
as hardware speeds scale differentially (disk seek times not improving
as fast as anything else, and network latencies somewhat constrained
by the speed of light) the places where an NFS-like protocol wastes
time are going to end up as serious bottlenecks.  But you're right
that people replacing NFS with SNFS today aren't going to end up
getting their daily work done by noon as a result.

>Also, SNFS uses a delayed-write policy (they sync their buffers after
>aging them 30 seconds), where NFS syncs a file's
>buffers on close.  This is to SNFS's advantage, in that SNFS is not
>"paying" for the writes as part of the benchmark timing, where NFS has to pay.
>The tradeoff is robustness.  The SNFS policy may be great for temporary 
>file generation (which the Andrew benchmarks do a lot of), but it 
>is certainly less robust than the NFS policy.  Would users be satisified
>with this tradeoff?  Probably.  (They already accept it in the local
>file system case.)  SNFS also is likely to lose some performance when
>(and if) they implement crash recovery.

More precisely, Sprite (or SNFS) allows an application to decide if it
wants NFS-like "robustness" (i.e., data forced to disk on close) or
UFS-like performance (with 30 seconds of vulnerability).  NFS doesn't
give you the choice.

>The real drawback to Sprite and Spritely NFS is that crash recovery is 
>not (currently) implemented.  The Sprite paper even says that clients
>who are using files from a crashed server have to be killed.  Somehow
>I think that customers would be more than a little miffed over that
>sort of "crash recovery" scheme.  The "leases" paper looks like a
>good compromise between statelessness and cache consistancy.

In Sprite, crash recovery is most certainly implemented.  It was
already done at the time the SNFS paper was written; see the reference
to Brent Welch's thesis (which might have been published by now).
I've even seen the source code.

I've had several "discussions" with John Ousterhout over the relative
virtues of lease-based recovery for Sprite-like systems.  I think
we decided that Sprite-recovery was better for Sprite (because it
had some additional useful features) but that a lease-based system
would be much easier to retrofit into an NFS-based system such as
SNFS.  I've been trying to convince someone to reimplement SNFS using
a more modern NFS port, and to add lease-based recovery, but so
far nobody has had the ambition.

-Jeff

brent@terra.Eng.Sun.COM (Brent Callaghan) (07/10/90)

References: <1075@cluster.cs.su.oz> <1990Jul3.164301.2676@dg-rtp.dg.com>

In article <1990Jul3.164301.2676@dg-rtp.dg.com>, stukenborg@egghead.rtp.dg.com (Stephen Stukenborg) writes:
> 2. Non-idempotent procedures.
> 
> Well, like the rev. 2 protocol, NeFS still assumes a stateless server 
> implementation.  Section 2.3 states "The NeFS protocol does not 
> guarantee idempotent operations per se, but it does allow requests to 
> be assembled that can detect retries of themselves ..."  Maybe I'm
> missing something, but I don't see which NeFS operations aid 
> retransmission detection.  
> How can I detect retransmissions unless there is a unique
> identifier in the request, and I keep a cache of duplicates (section
> 3.3.3).  What is in the NeFS protocol that solves this problem?  
> Nothing that I can see.  (The "valid" operator?)

NeFS tries not to make too many assumptions about the underlying transport.
If the transport takes care of retransmissions and guarantees "at most once"
semantics then it's a waste of time having explicit support for it in
the protocol.  

> Let's take a simple protocol failing from rev. 2, "exclusive create".  
> How do I build an exclusive create that detects retransmissions?
> In other words, how do I detect the situation where the client transmits 
> the request, server does exclusive create, server sends reply "success, 
> client never gets the reply, times out and retransmits the request.
> My exclusive create should return "success".  Without a duplicate
> request cache on the server, it will return "failed - file exists".

The client could carry a timestamp with its request and insert it
into an attribute like the modification time.  A retransmission can
be detected firstly by the existence of the file (via the valid
operator) then by checking the timestamp (and any other attributes).
Ditto for rename or any other filesystem modification request.

A better scheme might be the establishment of a duplicate request
cache accessible to requests.  The client could stash xid's in the
cache and make explicit references to the cache to detect retransmissions.

> 3. Unix bias
> 
> Well, the draft I have (2/12/90) has several places with Unix bias.  
> 
> Look at the definition of file attributes (4.2.3).  Those
> permission bits are Unix.  What about access control lists?  Should
> file permission be dependent on the "ftype" field?

It's a more general issue than just "Unix Bias".  How do you design
a protocol that supports remote access to a wide variety of filesystems
from a wide variety of OS's without generating a monster that nobody
would choose to implement ?  It's not an easy problem.  I don't think
there's any difficulty in making NeFS support lots of different
filesystems.  The difficulty is in finding a "common set" of file
attributes and operations that supports interoperability ?  This
is a worthy goal if the "common set" can be made small and easy
to understand and map to.  There's not a lot of common ground when
it comes to file permissions.  Let me have your ideas.

> The "link" operator is called out in the document as being
> very Unix-specific.  I agree with the statement that the "operator
> list should be confined to only those operators that all
> implementations can reasonable expect to implement."   So you 
> define a file type "directory", and define operators that deal 
> with directories (e.g. add entry, delete entry, lookup, etc.).
> Having OS-specific operators is the wrong answer if you are trying 
> define an operating system independent protocol.

All filesystems in common use support some notion of a directory.  Flat
filesystems may allow you to operate on "the" directory but reject
operations that attempt to create or delete a directory.  This is
entirely reasonable.  The "link" operator sticks out as being Unix-specific
but the notion of allowing filesystem objects to have multiple names
appears to be generally useful.  The "common set" of operations is
just an ideal that implementations can attempt to conform to.  It's
OK to return an ENOTSUPP if the operation has no meaning - just as
it is now for NFS clients attempting to create symbolic links on
System V filesystems.

> Although it is really an RPC issure, authentication is something that is 
> given short shrift in the document.  Section 3.3.4 says that "authentication 
> information in the environment is conveyed by the interpreter to the 
> NeFS operators to be used in checking access permissions for filesystem 
> objects."  I hope they're not assuming AUTH_UNIX authentication like
> the version 2 protocol did.

No.  NFS is not restricted to AUTH_UNIX.  It'll support whatever
authentication is conveyed by the underlying RPC authentication
flavor.  There are already implementations of "secure" (public key)
and Kerberos authenticated NFS.  It's not appropriate for NeFS as
a protocol to spell out what's to be used for authentication.  It'll
use whatever is provided.

> 5. No facility to support atomic filesystem operations.
> 
> As it stands now, the "lock" operator needs work.  "Beware of 
> deadlocks" when nesting locks is the wrong answer. Nor should
> it be an error to nest locks.  The "lock" operator should do deadlock 
> detection.  

Yes.  A better design for this operator would be to allow a vector
of filehandles to be locked.  This would allow the server to
order the locks appropriately to avoid deadlock.

> What does the spec mean in the IMPLEMENTATION section when it states 
> "mandatory locking is required for all NeFS clients"?  That each
> filesystem operator is going to get a lock?  (e.g. the write 
> operator is going to get an exclusive lock on the file before it writes 
> the data?)  Locks are mandatory only if everyone is forced to use
> them.

Yes. It means that if you hold an exclusive lock, all other clients
attempting to use the filehandle will block until the lock is freed.
If you hold a non-exclusive lock, Other clients that would change the
object that corresponds to the filehandle will block. A lock is limited
to the duration of a request.

> 6. Performance.
> 
> You can definitly see where defining NeFS "programs" that can do several
> serveral remote filesystem operations can be a big win.  This seems to be 
> the real strength of an extensible protocol.

> An extensible protocol is an innovative idea for the next version of
> NFS.  I just think there needs to be more discussion of the "real"
> protocol issues, rather than whether or not postscript is the proper
> language to implement an extensible protocol.

Right on!
--

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051