phil@osiris.UUCP (Philip Kos) (03/31/89)
I got lots of replies to my query about a way to make advisory locking work over NFS. The consensus seems to be using fcntl (or lockf), which uses lockd to communicate info with the NFS server. Thanks (so far, the replies are still coming in) to Vic Abell, Glenn Barry (who even sent me source for a "flock" that uses fcntl), Mark Sommer, Stephen X. Nahm, Jim Grande, Guy Harris, and Robert Claeson. Now a problem.. it doesn't seem to work. There are a couple of things: when we (the local Sun sys admin and I) tried running my test the first time after starting up the statd and lockd processes on the clients, we noticed that all of them immediately hung in state D. This sounds like a problem with lockd that one of the respondents mentioned; could someone elaborate on it? Anyway, we looked for the lockd processes we had started and they had *all died*. After restarting them, the test processes started running. Bizarre, but non-fatal. The second thing is that even when the test processes run OK, the results are NOT OK, the common log file (which the processes are all trying to get exclusive locks on) contains much overwritten text. Just so you all know what I'm dealing with, I rewrote the locking function I was talking about so that it follows the following algorithm. 1. Try to lock the common log file using fcntl with type == F_WRLCK and whence == start == len == 0. If fcntl fails with errno EAGAIN or EACCES (the latter is apparently equivalent to EAGAIN on Pyramid OSx despite claims in the manpage that that feature hasn't been implemented yet :-) try up to 7 more times after short sleeps. If the lock cannot be granted after eight fcntl calls, or the reason for failure is "non-trivial", return an error code. 2. Seek (actually fseek()) to EOF on the common log file. 3. Write the message (fprintf()/fflush()) to the common log file. 4. Unlock the log file and return. Does it sound like anything I'm doing here is wrong? This code works great on Pyramids running NFS, it only fails on our Suns. I could, if I absolutely had to, use write instead of fprintf. It would be very ugly and very painful to implement so I don't think it buys anything, but I could be wrong. There is of course the possibility that we're not running the right lockd, statd, or something else on the Suns. I don't have the faintest idea how to find out. (We had to start up statd and lockd on our client workstations just to test this, btw - they don't normally run those daemons because they're sufficiently bulky to cause complaints about response time.) If anyone suspects this to be a problem, let me know, and also let me know how I can find out - I'm a total novice at anything NFS-ish. Phil Kos