[comp.protocols.nfs] Symlink locking considered useless over NFS

chip@tct.com (Chip Salzenberg) (04/19/91)

I sent Rahul mail on this subject some time ago, but he seems not to
have received it; thus this article debunking his locking "solution".

According to Rahul Dhesi <dhesi@cirrus.COM>:
>The usual technique for locking with a lock file ...  The ... problem
>is that exclusive creates do not work over NFS.  Solution follows.
>
>int get_a_lock()
>{
>     if (create(symlink called MUTEX that points anywhere) == failed) {
>	die("serious problem -- can't create MUTEX");
>     }

This "solution" is nothing of the kind.

NFS can report failure on a symlink creation (or on directory
creation) even if the operation succeeds.  Any locking protocol that
depends on the return code of symlink or directory creation is not
robust over NFS.

NFS's statelessness is supposed to be a feature.  Well, as far as I'm
concerned, the designers of NFS can go take a flying leap, and they
can take their stateless protocol with them.
-- 
Brand X Industries Custodial, Refurbishing and Containment Service:
         When You Never, Ever Want To See It Again [tm]
     Chip Salzenberg   <chip@tct.com>, <uunet!pdn!tct!chip>

thurlow@convex.com (Robert Thurlow) (04/20/91)

In <280EE8A1.30D@tct.com> chip@tct.com (Chip Salzenberg) writes:

>NFS's statelessness is supposed to be a feature.  Well, as far as I'm
>concerned, the designers of NFS can go take a flying leap, and they
>can take their stateless protocol with them.

C'mon Chip, flame in the right direction.  The lack of support for
O_EXCL in the create operation of NFS isn't a feature of statelessness,
it's just a simple-minded protocol bug.  If Sun would get off its cans
and tell us the way a new protocol revision carried this information,
all of us vendors could add support in ten minutes.

But you can go ahead and flame Sun for the protocol bug and the time
they've let go by on addressing it.

Rob T
--
Rob Thurlow, thurlow@convex.com
An employee and not a spokesman for Convex Computer Corp., Dallas, TX

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (04/20/91)

In <280EE8A1.30D@tct.com> chip@tct.com (Chip Salzenberg) writes:

>This "solution" is nothing of the kind.

>NFS can report failure on a symlink creation (or on directory
>creation) even if the operation succeeds.

How about calling it "the closest thing to a solution that has yet been
seen on Usenet"?

I seem to remember an email message from you to which I sent a brief
reply of thanks;  perhaps my reply failed somewhere.  (Hmmm....maybe
your mail software uses symlinks for locking, and the lock failed, so
you never got my reply :-)
--
Rahul Dhesi <dhesi@cirrus.COM>
UUCP:  oliveb!cirrusl!dhesi

kyle@uunet.UU.NET (Kyle Jones) (04/21/91)

chip@tct.com (Chip Salzenberg) writes about Rahul Dhesi's code:
 > This "solution" is nothing of the kind.
 > 
 > NFS can report failure on a symlink creation (or on directory
 > creation) even if the operation succeeds.  Any locking protocol that
 > depends on the return code of symlink or directory creation is not
 > robust over NFS.

True, but using symlinks could be a start toward a solution.
Instead of creating symlink pointing to a random place, why not
put your hostname and process ID into the symlink?  Then you
could use readlink() to read the information back from the
symlink and verify that it is your lock.  This beats link()
because you get a file creation _and_ one atomic write which you
can use for identification purposes.

 > NFS's statelessness is supposed to be a feature.  Well, as far as I'm
 > concerned, the designers of NFS can go take a flying leap, and they
 > can take their stateless protocol with them.

No argument here.

kyle jones   <kyle@uunet.uu.net>   ...!uunet!kyle

Oh, yeah, that was that package that I was having so much trouble
installing.  There was a combination of things going wrong, and
to make the story short, someone should go back in time and
shoot the person who invented NFS.  And then bugger the corpse.
	- one very unhappy system administrator

chip@tct.com (Chip Salzenberg) (04/22/91)

According to Rahul Dhesi <dhesi@cirrus.COM>:
>In <280EE8A1.30D@tct.com> chip@tct.com (Chip Salzenberg) writes:
>>This "solution" is nothing of the kind.
>>NFS can report failure on a symlink creation (or on directory
>>creation) even if the operation succeeds.
>
>How about calling it "the closest thing to a solution that has yet been
>seen on Usenet"?

I would agree with that assessment, as long as it were followed by the
disclaimer, "but then, 'almost solved' means 'not solved.'"

If I had a Sun machine, I'd avoid NFS-mounted mailboxes like the
plague.  (I'd shun cliches, too; but that's another story.)

>I seem to remember an email message from you to which I sent a brief
>reply of thanks;  perhaps my reply failed somewhere.  (Hmmm....maybe
>your mail software uses symlinks for locking, and the lock failed, so
>you never got my reply :-)

Well, I apologize if I ruffled any feathers.  What really sets me off
is when a vendor creates a problem for which no reasonable workaround
exists, and then lets it endure for a long time.  Sun's motto should
be: "Our Lockd Matches Our NFS Protocol: They're Both Broken".
-- 
Brand X Industries Custodial, Refurbishing and Containment Service:
         When You Never, Ever Want To See It Again [tm]
     Chip Salzenberg   <chip@tct.com>, <uunet!pdn!tct!chip>

chip@tct.com (Chip Salzenberg) (04/22/91)

According to thurlow@convex.com (Robert Thurlow):
>C'mon Chip, flame in the right direction.  The lack of support for
>O_EXCL in the create operation of NFS isn't a feature of statelessness,
>it's just a simple-minded protocol bug.

I don't understand.  How can the server know that it's me
re-requesting an already-successful creation, and not some other
process on the client machine asking to create the same file?

(Please note that solutions based on keeping transaction info for the
last N seconds are NOT acceptable, as the network cable might be
disconnected for N+1 seconds.)
-- 
Brand X Industries Custodial, Refurbishing and Containment Service:
         When You Never, Ever Want To See It Again [tm]
     Chip Salzenberg   <chip@tct.com>, <uunet!pdn!tct!chip>

thurlow@convex.com (Robert Thurlow) (04/22/91)

In <28124239.17CE@tct.com> chip@tct.com (Chip Salzenberg) writes:

>According to thurlow@convex.com (Robert Thurlow):
>>C'mon Chip, flame in the right direction.  The lack of support for
>>O_EXCL in the create operation of NFS isn't a feature of statelessness,
>>it's just a simple-minded protocol bug.

>I don't understand.  How can the server know that it's me
>re-requesting an already-successful creation, and not some other
>process on the client machine asking to create the same file?

>(Please note that solutions based on keeping transaction info for the
>last N seconds are NOT acceptable, as the network cable might be
>disconnected for N+1 seconds.)

You have to enforce do-it-once semantics to not get false failures,
right?  Over UDP, that means you need a transaction cache, which is not
a perfect solution, granted.  UDP should not be the only option.  Over
TCP, you could trust that the transaction happened only once, but you'd
still be hosed without O_EXCL in the protocol.  In my experience, UDP
with a transaction cache works pretty reliably, and NFS over TCP isn't
far away, so the limitation as I see it is the protocol bug.

Rob T
--
Rob Thurlow, thurlow@convex.com
An employee and not a spokesman for Convex Computer Corp., Dallas, TX

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (04/22/91)

In article <28124239.17CE@tct.com>, chip@tct.com (Chip Salzenberg) writes:
> According to thurlow@convex.com (Robert Thurlow):
> >C'mon Chip, flame in the right direction.  The lack of support for
> >O_EXCL in the create operation of NFS isn't a feature of statelessness,
> >it's just a simple-minded protocol bug.
> 
> I don't understand.  How can the server know that it's me
> re-requesting an already-successful creation, and not some other
> process on the client machine asking to create the same file?
> 
> (Please note that solutions based on keeping transaction info for the
> last N seconds are NOT acceptable, as the network cable might be
> disconnected for N+1 seconds.)

NFS without the XID cache is broken.  Try it; you'll hate it.
NFS is emphatically not "stateless."  NFS is "relatively stateless,
particularly when compared to other network file systems also designed in
the early and mid 1980's, such as the AT&T RFS."

The "NFS is stateless" cliche is partly marketing hype from the NFS/RFS
war, and partly a descriptive slogan for the design principle of avoiding
state when possible and of exploiting the consequent rebustness against
server, client, and network failures or scaling problems.

If you dislike NFS, then please switch to RFS, AFS, or your own design.
Non-constructive complaints about basic NFS design principles are as
interesting in comp.protocols.nfs as the perennial H.R. complaints in
comp.arch about high level languages hiding machine code arcana.

The O_EXCL hole is universially considered a bug, and always named as
something to fix in the next protocol turn.  I think it was fixed in
Rusty's final proposals.  Unfortunately, the only organization that could
change the protocol has been having difficulties.
(No, I don't see how the IETF can do anything but make things worse.)

Vernon Schryver,   vjs@sgi.com

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (04/23/91)

In article <thurlow.672335997@convex.convex.com>, thurlow@convex.com (Robert Thurlow) writes:
>
>                                                ...  NFS over TCP isn't
> far away ...

This is the second recent reference I've seen to NFS/TCP/IP.  (I'm probably
not allowed to say where I saw the other reference.)  Why would anyone want
NFS/TCP?

NFS over TCP seems good if you're trying to get over a very slow or very
lossy link.  NFS/TCP can't scale a fraction as far as NFS/UCP, despite
statements to the contrary in that other, unnamed reference.  NFS/TCP seems
a generally unlikely choice.

I write this as someone who argued hard in the old NCF-NCS-NCA battles for
optional connection oriented transport for remote procedures (i.e. NCS over
TCP).  At Silicon Graphics, we care about remote procedures over TCP,
because our graphics look like function calls, and we need performance
several orders of magnitude faster than common remote procedure calls.
(by cheating, we get enough of what we need).

It seems straight forward to modify the Sun VAX reference code to use TCP
handles, should anyone want to do it.

I intend this as a serious question, not as the flame it may seem.  There
must be something I'm missing.

Vernon Schyrver,   vjs@sgi.com

lm@slovax.Eng.Sun.COM (Larry McVoy) (04/23/91)

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
> In article <thurlow.672335997@convex.convex.com>, thurlow@convex.com (Robert Thurlow) writes:
> >
> >                                                ...  NFS over TCP isn't
> > far away ...
> 
> 
> NFS over TCP seems good if you're trying to get over a very slow or very
> lossy link.  NFS/TCP can't scale a fraction as far as NFS/UCP, despite
> statements to the contrary in that other, unnamed reference.  NFS/TCP seems
> a generally unlikely choice.

NFS/UDP works well when you are on local net.  In order to make it work
well over WAN's, you end up essentially reimplementing TCP algs in the code
that calls UDP.  There was a nice study of all this in the Dallas Usenix
this winter.  I think the general conclusion was that NFS/TCP was about 20%
slower in the LAN case but much faster in the WAN case.  The argument can
be made (if it was not) that tuning TCP to get back that 20% is probably
a better idea than trying to make UDP deal with WANs.
---
Larry McVoy, Sun Microsystems     (415) 336-7627       ...!sun!lm or lm@sun.com

les@chinet.chi.il.us (Leslie Mikesell) (04/23/91)

In article <3074@cirrusl.UUCP> Rahul Dhesi <dhesi@cirrus.COM> writes:
>In <280EE8A1.30D@tct.com> chip@tct.com (Chip Salzenberg) writes:

>>This "solution" is nothing of the kind.
>>NFS can report failure on a symlink creation (or on directory
>>creation) even if the operation succeeds.

>How about calling it "the closest thing to a solution that has yet been
>seen on Usenet"?

There is some discussion in comp.unix.shell about using
 (umask 777; echo >file) || .... failure code...
as an attempt to get a more or less atomic operation in spite
of NFS, but I don't see how it can work if you run as root.

Les Mikesell
  les@chinet.chi.il.us

backman@vaxeline.ftp.com (Larry Backman) (04/23/91)

In article <98765@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
>
>NFS over TCP seems good if you're trying to get over a very slow or very
>lossy link.  NFS/TCP can't scale a fraction as far as NFS/UCP, despite
>statements to the contrary in that other, unnamed reference.  NFS/TCP seems
>a generally unlikely choice.
>
>I write this as someone who argued hard in the old NCF-NCS-NCA battles for
>optional connection oriented transport for remote procedures (i.e. NCS over
>TCP).  At Silicon Graphics, we care about remote procedures over TCP,
>because our graphics look like function calls, and we need performance
>several orders of magnitude faster than common remote procedure calls.
>(by cheating, we get enough of what we need).
>
>It seems straight forward to modify the Sun VAX reference code to use TCP
>handles, should anyone want to do it.

We at FTP Software would *love* to see some TCP NFS server's.  Our 
experience has convinced us that NFS/UDP is great for Sun's running
on a single local net without routers.  Once you start throwing 
complex networks into the equation with multiple router's between
you and your file system things can break down with UDP & the fragmentation/
reassembly issues.

Secondly; even on an local net, a fast server sending an 8K Read
response at a PC with an older network card (3c503, WD8003) can
throw stuff at the PC faster than the card can take it off the wire.
This is obviously fixable by doing smaller read's, however that is
going to affect overall throughput.

NFS/TCP solves both of those problems; the filing protocol (NFS)
can keep its large blcok size (8K), and let the TCP protocol
worry about how to get it through routers'; retransmit backoff's
and the like.

Of course, the TCP agument works only if your server's TCP can
put many megabits/sec. on the wire; most of the TCP's we've 
dealtwith lately seem to be more than capable of doing 3-4 Meg/Sec.

				Larry Backman
				backman@ftp.com

jim@cs.strath.ac.uk (Jim Reid) (04/23/91)

In article <98765@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
   Why would anyone want NFS/TCP?

There are a number of reasons. Firstly, NFS over UDP is fine when the
client and server are on the same bit of ethernet. The bursty nature
on NFS traffic - typically a handful of packets back to back on the
wire - is not so well suited to other types of networks. For instance,
when the client and server are separated by routers and/or bridges
that may drop packets now and again. [For NFS/UDP this is bad news as
it causes the whole NFS request - perhaps an 8 Kbyte read or write to
be retransmitted.] With TCP, only the dropped packet need be
retransmitted. Another benefit of TCP is the use of round-trip times
to assess how much data can be shifted across the network. In other
words, it is more network-friendly than UDP if there is congestion or
if the throughput rates of all the network interfaces between the
client and server are not comparable. TCP's flow control algorithms
could also be used to enable NFS clients and servers exert flow
control on each other: "don't send me too many requests because I'm
almost out of network buffers". All of these things become more
important when NFS gets used over long-haul links like T1 lines where
the overheads and costs of sending packets are more significant than
on an ethernet or token ring.

Aside from the networking considerations, there are two other areas
where NFS/TCP will win over NFS/UDP. One is improved integrity of the
NFS protocol. With TCP, either end of the connection will know if the
other end goes away, which can make life less painful when a server
dies. Since TCP guarantees delivery of data, there is no possibility
of NFS requests and replies going astray as can happen with NFS/UDP at
present. [Consider this scenario: a NFS client makes a link request. The
server does this but the reply gets lost. The client repeats the request
after it times out, only to get an error returned because this time the
server fails the request because the link already existed! Although more
recent NFS implementations cache the last few NFS requests and use
transaction id's to detect duplicates, it's not foolproof.]

The final area where NFS over TCP beats NFS/UDP is security. Since UDP
is a datagram protocol, it is trivial to inject UDP data that looks
like NFS requests and replies into a network. This makes it
straightforward to impersonate an NFS client or server. With TCP, it
is theoretically possible but far from easy to inject data into an
existing TCP connection. Thus clients and servers can have more
confidence about who is at the other end of the NFS protocol if TCP is
used. With UDP, it could be anything on the net that sends out the
right-looking packet that is having the NFS dialogue with you.

The only reasons why NFS over TCP is not attractive are performance
and interworking. The former is not really a problem these days, given a
decent implementation. The latter is more of a problem. Since NFS over
TCP will only talk to other NFS/TCP platforms, you lose when you want
to use NFS/TCP with a "standard" implementation that only speaks UDP.

		Jim

bmw@isgtec.uucp (Bruce M. Walker) (04/24/91)

In article <28123DD5.1716@tct.com> chip@tct.com (Chip Salzenberg) writes:
> If I had a Sun machine, I'd avoid NFS-mounted mailboxes like the
> plague.

Well, my site demands centralized mailboxes.  How would you suggest
I do this without NFS?

- I am aware of POP[23]/PCMAIL/IMAP: we use elm, ucbMail and mh.  Only
  mh supports one of these (POP2).

- We have Sun4's, SGI's, DECstations, an IBM RS6000 and Convergent Tech's.
  Can one of these do a better job of being a mail-host?

--
         "Remember, only *you* can prevent emacs!"
bmw@isgtec.uucp  [ ..uunet!utai!lsuc!isgtec!bmw ]  Bruce Walker

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (04/25/91)

> > I wrote:
>    Why would anyone want NFS/TCP?

Well, after mail, postings, & time, I've thought about it some more...

In article <JIM.91Apr23135705@baird.cs.strath.ac.uk>, jim@cs.strath.ac.uk (Jim Reid) writes:
>
> There are a number of reasons. Firstly, NFS over UDP is fine when the
> client and server are on the same bit of ethernet....
>                           ..... All of these [bad] things become more
> important when NFS gets used over long-haul links ...

NFS/TCP sounds good for private data over the Internet or slower.

>  .... Since TCP guarantees delivery of data, there is no possibility
> of NFS requests and replies going astray as can happen with NFS/UDP...

NFS/TCP should suffer short network problems better than NFS/UDP.  As you
say, the need for the XID cache goes away in those cases.  But what do you
do when server or client crash and are restarted?  In principle it is easy
to re-establish the TCP circuit, but I bet the code will be buggy and
unreliable for years.  One would probably be building TCP connections in
the client and server kernels.  The fastpath and client handles get a lot
messier.

What happens when a bad network connection or circuit re-user breaks the
TCP virtual circuit?  Then the client is in the same stew as with a lost
UDP packet.  This problem cries out for the XID cache.

> The final area where NFS over TCP beats NFS/UDP is security. Since UDP
> is a datagram protocol, it is trivial to inject UDP data that looks
> like NFS requests and replies into a network. This makes it
> straightforward to impersonate an NFS client or server. With TCP, it
> is theoretically possible but far from easy to inject data into an
> existing TCP connection. Thus clients and servers can have more
> confidence about who is at the other end of the NFS protocol if TCP is
> used....

I use machines with many hundreds of NFS clients.  Extrapolating to the
Internet gets thousands of mostly idle NFS clients/server.  Just to limit
the number of TCP connections to at most hundreds, you'll be building up
and tearing them down continually.  Good times to for a bad guy to fake in.
As is often said, you should not rely on TCP curcuits for security.

> The only reasons why NFS over TCP is not attractive are performance
> and interworking. ...

NFS/TCP should be significantly faster than NFS/UDP, when the fast network
and not the disk is the bottleneck.  I think I know why in my measurements
recent 4.3BSD TCP over FDDI is signifcantly faster than UDP.  Even if not
true for other fast hardware (tho I think it will be), consider the speed
advantages of huge buffers.

The big problem with NFS/TCP still seems to me that it won't scale
as well as NFS/UDP.

Vernon Schryver,   vjs@sgi.com

emv@ox.com (Ed Vielmetti) (04/26/91)

In article <99526@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

   The big problem with NFS/TCP still seems to me that it won't scale
   as well as NFS/UDP.

well, there are lots of things that could need to be scaled up --

number of connections to a server. NFS/UDP wins here in your analysis, or
put another way there really isn't any obviously good way to have 1000
hosts all talking directly to the same machine even though that
machine may be fast enough.

speed of transfer on fast networks.  NFS/TCP wins here because it's
easier to get a good TCP on FDDI-class nets than it is to get similar
performance out of UDP.  (!?)  There are other speed issues here that
are more implementation sensitive.

distance from client to server where this measures some combination 
of packet loss, round-trip time and variability, and other network 
nasties.  NFS/TCP wins because it can't hardly help but being better
than current NFS/UDP implementations, and because more work has been
done in getting TCP performance to be good on wide-area nets.

is this a fair assessment?  

for myself, the "scaling" i'd like to see is over wide area networks;
for that purpose NFS/TCP looks quite reasonable.  i wonder whether
it'll be real enough in time to compete for attention with the other
reasonable wide-area file service (Transarc's AFS).

-- 
 Msen	Edward Vielmetti
/|---	moderator, comp.archives
	emv@msen.com

"(6) The Plan shall identify how agencies and departments can
collaborate to ... expand efforts to improve, document, and evaluate
unclassified public-domain software developed by federally-funded
researchers and other software, including federally-funded educational
and training software; "
			High-Performance Computing Act of 1991, S. 218