[comp.protocols.nfs] Locking Violation Problem

andrew@siesoft.co.uk (Andrew Sinclair) (08/24/90)

In the PCNFS users guide under the section "locking files" it says :
If you invoke locking services with /MS ..... any DOS system call affecting the drive will cause a fatal error :

Locking violation
Abort, Retry or Ignore:

It then says that to recover you must unmount and remount the drive. Why ? 
Shouldn't it allow access once the lock has been removed (by the 
application/client that locked in the first place). Does this suggest that the 
client that gets in first and locks the file does not unlock it when it
has finished ? 

sounds like an odd spec of the lock manager to me ! Is there a way out of this ?
I do not want to remount my drives every time I see a locked file (I'll wait). 
What if I have other files open on that drive that I don't want to close ?

Thanx in advance
Andrew S.
--
+---------------------+----------------------------------------------------+
|Andrew Sinclair      |andrew.......                                       |   
|Siemens SDG          |                                                    |   
|Nixdorf House        |                                                    |

geoff@hinode.East.Sun.COM (Geoff Arnold @ Sun BOS - R.H. coast near the top) (08/25/90)

Quoth andrew@siesoft.co.uk (Andrew Sinclair) (in <1990Aug24.090955.28756@siesoft.co.uk>):
#In the PCNFS users guide under the section "locking files" it says :
#If you invoke locking services with /MS ..... any DOS system call affecting the drive will cause a fatal error :
#
#Locking violation
#Abort, Retry or Ignore:
#
#It then says that to recover you must unmount and remount the drive. Why ?
#Shouldn't it allow access once the lock has been removed (by the
#application/client that locked in the first place). Does this suggest that the
#client that gets in first and locks the file does not unlock it when it
#has finished ?
#

We're reworking that section of the manual (faint :-)

The only time you need to unmount and remount the drive is
when the remote lock manager has gone away (either due to
network problems, server reboot, or genuine lock manager failure).
This is a (painful) tradeoff. Suppose the server is rebooted
while you have locks established. How do you reestablish the
locks? The lock manager/status monitor architecture, with a
strong bias towards "real" operating systems like Unix (;-)
uses the following model (grossly simplified):

(1) when a client requests a lock on a file, the Network lock manager
(NLM) on the client sends a lock request to the NLM on the server

(2) the server NLM notifies the local status monitor (SM) that it needs
to keep the client informed of any status changes

(3) if the server SM hasn't contacted this client before, it
makes a call to the client SM to establish bidirectional
notification

(4) eventually the client relinquishes the lock, and calls the
server NLM

(5) if the server is holding no more locks for the client, the server SM
calls the client SM to terminate their monitoring.

Now, if the server is rebooted between (3) and (4), the server
NLM and SM start up and go into recovery mode:

(3a) The server NLM begins its "grace period". During the grace period,
it will only accept requests to reclaim locks.

(3b) The server SM checks to see which clients it was responsible for
(recorded in the "/etc/sm" directory) and calls the SM on each
client to advise it that the server has rebooted.

(3c) The client SM advises the client NLM of the server state change.

(3d) The client NLM checks its database for any locks held by the
server and issues "reclaim" requests for those locks.

(3e) After a suitable period, the server NLM ends the grace period and
normal service resumes.

[The problem of a client reboot is less traumatic: the client SM simply
calls up the server SM, which advises the server NLM to release all
locks held for the client.]

Now, the problem on the PC is how to handle this SM and NLM traffic.
To participate in this, you must implement an SM, which in turn
means implementing a portmapper. Then you have to be sure that when
an SM state change is notified you can issue all the reclaims
in time, regardless of what the PC is doing. There is also
a fair amount of "shadow state" to hold in order to cope with
failures during the recovery process.

For PC-NFS, we decided that we couldn't justify adding all of this
baggage into the product, and that we would have to adopt a simpler,
if less complete, solution. We added the notion of "non-monitored locks"
to the NLM, so that the PC could request a lock without provoking
the server into bombarding it with SM RPC calls. And we adopted
the strategy of requiring you to remount the drive (hence clearing
all the state on both the server and client). Yes, it's
inconvenient, and there are one or two things we can do in the
future to make it less so.

I hope this has been useful. We know that we need to document
all of this better, and in fact a number of publications are
on the way from several different sources. (Dunno why the
first one isn't out yet....)

Geoff

-- Geoff Arnold, PC-NFS architect, Sun Microsystems. (geoff@East.Sun.COM) --

To receive a full copy of my .signature, please dial 1-900-GUE-ZORK.
Each call will cost you one zorkmid.