[news.software.notes] possible locking problem

garym@telesoft.UUCP (Gary Morris @flash) (02/26/88)

We have been having occasional problems when two user try to post a response
to the same note at the same time.  It appears that the index file gets
updated incorrectly as what we end up with is one response where the text is
not what the person posted but is random text from some other part of the
notesfile. 

We are running notes on a Sun3/150 and it is mounted via nfs (soft r/w) to
about 20 other Sun2 and Sun3 machines.  It is very heavily used and we often
have 3 or 4 people simulateously reading the same notesfile or responding,
usually from different machines (via nfs). 

Has anyone seen this kind of behaviour?  I thought the lock files were
supposed to prevent this problem.
--GaryM
-- 
Gary Morris -- UUCP:  ucbvax!ucsd!telesoft!garym
N6FRT                 mcvax!enea!log-hb!garym

wunder@hpcea.CE.HP.COM (Walter Underwood) (02/27/88)

NFS is not safe for creat() locking, according to our NFS wizards.
There is a race condition which can allow two clients to create the
same file, and you seem to be exercising it.

wunder

jthomp@convex.UUCP (03/27/88)

/* Written  4:36 pm  Feb 25, 1988 by garym@telesoft.Sun.COM in convex:news.software.notes */
/* ---------- "possible locking problem" ---------- */
>> We have been having occasional problems when two user try to post a response
>> to the same note at the same time.  It appears that the index file gets
>> updated incorrectly as what we end up with is one response where the text is
>> not what the person posted but is random text from some other part of the
>> notesfile. 
>> 
>> We are running notes on a Sun3/150 and it is mounted via nfs (soft r/w) to
>> about 20 other Sun2 and Sun3 machines.  It is very heavily used and we often
>> have 3 or 4 people simulateously reading the same notesfile or responding,
>> usually from different machines (via nfs). 
>> 
>> Has anyone seen this kind of behaviour?  I thought the lock files were
>> supposed to prevent this problem.

While I have yet to see this type of behaviour, I can imagine what
its cause might be.

You're NFS mounting your notes partition all over the world.
Notes has an internal locking mechanism that depends on a creat(2)
being 'unique'.  That is to say, if the file exists, and you call 
ret = creat("foo",0); you may get a successful return (file descriptor)
even though the file existed!  (i.e. you've just blown the lock.)
The solution is to substitute the following code in
'locknf' (in misc.c (notes 1.7)):

changed code follows:  (sorry this is so muddy, but its about 3 am here.)
Note (pun?) that I haven't actually tried this code, but I'm about
to do the same thing here at Convex.  Incidentally, R. Kolstad is my boss.

/*
 * lock creates a lock file, or waits until it can create the lock. lock
 * files are of the form lock#  where # is a character passed to the routine. 
 *
 * Rob Kolstad	10/20/80 modified: rbe December 1981 to add full path name
 * for lock file 
 */

locknf (io, c)
struct io_f *io;
char    c;
{
    register int i, fd, holderr, trys;
/*    char    p[WDLEN];  /* not needed */

    /* not needed for lockf */
    /* sprintf (p, "%s/%s/%c%s", Mstdir, LOCKS, c, io -> nf); */

    trys = LOCKTRY;			/* set him up */

    /*
     * old bad code while ((i = creat (p, 0)) < 0) 
     */
    switch (c) {			/* determine which type of lock we
					 * want */
      case 'DSCRLOCK':
	fd = io->fidrdx;
	break;
      case 'TXTLOCK':
	fd = io->fidtxt;
	break;
      default:
	fprintf (stderr, "unknown lock %c in lockf, bye\n", c);
	exit (BAD);
    }

    /* save the seek pointer for later restoration */
    if ((old_seek_place = lseek (fd, 0, 0)) < 0)
	/* do something particularly bad here, like exit */
	perror ("fatal notes error lseek"), exit (BAD);

    while ((i = lockf (fd, F_TLOCK, 0)) < 0) {
	if (trys-- == 0) {
	    holderr = errno;		/* before it's abused */
	    fprintf (stderr, "lock %c (%s) permanently locked - consult a guru\n",
		     c, io->nf);
	    [there is usually 'nfmaint' code here]
		ttystop ();
	    exit (BAD);			/* jthomp thinks we should recover
					 * from this. */
	}
	sleep (2);			/* guarantee at least 1 */
    }
    (void) lseek (fd, old_seek_place, 0);	/* set the pointer back */
    ignoresigs++;			/* critical section */

    /*
     * could be above getting the lock, but wanted to be able to suspend
     * while getting the lock.  The interuptable window is very small 
     */

    /*
     * old bad code -- can't close the fd, the lock goes away.  --jt close
     * (i); 
     */
}

/*
 * unlock takes the same arguements as the lock routine, and it will remove
 * the corresponding lock file 
 *
 * Rob Kolstad 10/20/80 modified: rbe December 1981 to add full path name for
 * lock name 
 */

unlocknf (io, c)
struct io_f *io;
char    c;
{
    int     fd;				/* for the filedesc */

    /* char    p[WDLEN];  /* not needed */
    switch (c) {			/* determine which type of lock we
					 * want */
      case 'DSCRLOCK':
	fd = io->fidrdx;
	break;
      case 'TXTLOCK':
	fd = io->fidtxt;
	break;
      default:
	fprintf (stderr, "unknown lock %c in lockf, bye\n", c);
	exit (BAD);
    }

    /* sprintf (p, "%s/%s/%c%s", Mstdir, LOCKS, c, io -> nf);  /* unnecessary */
    /* generate file name */
    /* x (unlink (p) < 0, "unlock: unlink lock");	/* old, bad code */
    x (lockf (fd, F_UNLOCK, 0) < 0 "unlock: bad lockf call");
    ignoresigs--;			/* no longer critical */
}

/* End of text from convex:news.software.notes */

/* End of text from convex:news.software.notes */