[net.sources.d] P & V Algorithms needed

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/15/86)

In article <300@imagen.UUCP> geof@imagen.UUCP (Geoffrey Cooper) writes:
>Mutex locking among N processors can be implemented using N single bit
>variables.  The only restriction is that it be possible to write any
>one variable without affecting the others.  It is not necessary that
>reads and writes be atomic.
>
>The algorithm works by having one variable associated with each
>processor.  Each processor writes to its own variable and reads the
>variables of all others.  Fairness is assumed not to be at issue.

If I understand you correctly, this scheme does not work.  Consider:
	Start with all flags clear.
	Processor A wants to enter a critical region,
		so it starts to set the A-flag (after
		reading all the other processor flags
		and finding them all clear).
	Concurrently, processor B wants to enter the
		same critical region, so it starts to
		read the processor flags to see if
		some other processor is in the region.
		In particular, it tries to read the
		A-flag.
	Dependent on the outcome of a timing race, it
		is now possible for both processors
		to be executing in the critical region.

I have never heard of a mutual exclusion scheme that did not
at root depend on the existence of an interlocked test-and-set
operation of some kind.  I would be interested in hearing some
details about other schemes if they exist.

chris@umcp-cs.UUCP (Chris Torek) (03/15/86)

In article <1828@brl-smoke.ARPA> gwyn@brl-smoke.UUCP writes:
>I have never heard of a mutual exclusion scheme that did not
>at root depend on the existence of an interlocked test-and-set
>operation of some kind.  I would be interested in hearing some
>details about other schemes if they exist.

Here is one that, while probably completely useless in practice,
does not require a test-and-set scheme:

Assign a shared address as the `owner' address.  It contains the
index number of the CPU owning the resource being locked.  If a
CPU owns it but does not want it, it increments it, modulo the
number of CPUs.  If a CPU does not own the resource, it continuously
reads the location until its index appears.  Unfortunately, this
requires that each CPU busy wait for the resource, and that each
check the owner address very often so as to keep the resource
owner moving.

If you are willing to dedicate a CPU as a `lock server', the
algorithm can be changed to this:  In addition to the current owner,
there is a request address; and there is a distinguished `idle'
index.  If a CPU does not own the resource, but wants it, it puts
its own index into the request address (continuously, alternating
with reading the owner address until its index shows up).  When a
CPU that owns the resource is done with it, it writes the idle
index into the owner address.  The idle-index CPU's job is to hold
on to the resource until someone else wants it.  Again this requires
a busy wait, but processors that neither own nor want the resource
are not required to look at the owner address.  To cut down on
memory contention, one can make the idle-index CPU very slow, but
this increases latency.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

kre@munnari.OZ (Robert Elz) (03/16/86)

In article <1828@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn) writes:
> I have never heard of a mutual exclusion scheme that did not
> at root depend on the existence of an interlocked test-and-set
> operation of some kind.  I would be interested in hearing some
> details about other schemes if they exist.

What about CSMA/CD - an ethernet is a shared resource, with
no interlocked test & set, yet it seems that its occasionally
possible to get exclusive access to the thing...

The important properties are the ability to detect that a collision
has occurrred, and the ability to go back and start again.

I have used this scheme in the more conventional case of
two processors with shared memory, and no test&set to
do interlocking, the algorithm goes like this ..

	look see if the other processor has the resource locked,
	if so, wait

	set my lock bit

	look see if the other processor has the resource locked,
	if so, reset my lock bit, and go back to the start

	I own it.

The "wait" can be a busy wait, or it can mean just try for the
resource sometime later, whichever is appropriate.  The "go back
to the start" step should have some random delay associated with it
to avoid continuous collisions (deadlock).

I'd be interested to know if anyone can discover any way that
this can be defeated (very interested, since this algorithm
is actually in use, right now, on some hardware I have here...).

I haven't actually thought this through for a case of N processors,
N > 2, but I can't immediately see any reason it wouldn't function,
after all, ethernets work with more than 2 processors...

Robert Elz		kre@munnari.oz		seismo!munnari!kre

ps: I am moving this discussion into net.sources.d where it belongs

jas@mulga.UUCP (03/16/86)

In article <1070@munnari.OZ> kre@munnari.OZ (Robert Elz) writes:
>...
>The important properties are the ability to detect that a collision
>has occurrred, and the ability to go back and start again.
>...
>	look see if the other processor has the resource locked,
>	if so, wait
>
>	set my lock bit
>
>	look see if the other processor has the resource locked,
>	if so, reset my lock bit, and go back to the start
>
>	I own it.
>...

This algorithm looks like a simplified version of the Dekker's
algorithm that they always give in undergraduate concurrent
programming courses.
In that context it is considered to have two problems which arise
because you aren't allowed to assume anything about the relative
speeds of the processes:
(1) deadlock, maybe your scheme fixes this most of the time
	(I say "maybe" because while it's very unlikely that deadlock
	will occur, it is still possible in some degenerate cicumstances)
(2) starvation, one process never gets a go, because the other one always
	manages to do its tests first; once again, in reality, while this
	is possible, it is *highly* unlikely.

>I haven't actually thought this through for a case of N processors,
>N > 2, but I can't immediately see any reason it wouldn't function,

Just use generalised Dekker's algorithm, or one of the improved versions
published recently.

jas

geoff@desint.UUCP (Geoff Kuenning) (03/17/86)

In article <1828@brl-smoke.ARPA> gwyn@brl.ARPA writes:

> I have never heard of a mutual exclusion scheme that did not
> at root depend on the existence of an interlocked test-and-set
> operation of some kind.  I would be interested in hearing some
> details about other schemes if they exist.

Dijkstra invented one way back in 1965 [1] that depends only on atomic
loads and stores.  It has been extended by Knuth and later others [2]
to prevent starvation.

[1] Dijkstra, E.W.  Solution of a problem in concurrent programming
control.  Comm. ACM 8, 9 (Sept. 1965), 569.
[2] Eisenberg, M.A. and McGuire, M.R.  Further comments on Dijkstra's
concurrent programming control problem.

The following algorithm is presented in [2];  it guarantees that each
requesting processor will receive the resource within N turns. (NOTES:  I
translated this from GOTO loops in Algol;  I have not tested it and it
is not certified.  The code as written assumes multiprocessors and uses
spin loops for busy waiting.)

	#define N	4	/* Number of processors */

	static char control[N];	/* Control array, starts all zero */
	static int whohasit = 0; /* Current owner of the resource */

	/* Acquire exclusive access to the resource for the i'th processor */
	void acquire_resource (i)
	    int i;		/* Processor number */
	    {
	    int j;		/* Loop index */

	    while (1)
		{
		control[i] = 1;	/* Request access to the resource */
		for (j = whohasit;  (j + 1) % N != whohasit;  j++)
		    {
		    if (j == i)	/* Have we reached ourselves? */
			break;
		    else if (control[j] != 0)
			j = whohasit; /* Somebody of higher priority */
				/* ..wants it, restart the search loop */
		    }
		control[i] = 2;	/* We think we have the resource now */
		for (j = 0;  j < N;  j++)
		    {
		    if (j != i  &&  control[j] == 2)
			break;	/* Somebody else thinks THEY have it */
		    }
		if (j != N)
		    continue;	/* Somebody else thinks they have it, retry */
		if (control[whohasit] != 0  &&  whohasit != i)
		    continue;	/* Somebody else really does have it */
		whohasit = i;	/* Now we really DO have it */
		return;
		}
	    }

	/* Release exclusive access to the resource */
	void release_resource (i)
	    {
	    control[i] = 0;
	    }
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

geof@imagen.UUCP (Geoffrey Cooper) (03/17/86)

> In article <1828@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn) writes:
> > I have never heard of a mutual exclusion scheme that did not
> > at root depend on the existence of an interlocked test-and-set
> > operation of some kind.

You are correct, the scheme that you outlined doesn't work.  The hint is
that the correct scheme isn't fair.  Try this code:
    #define N <number of processors>
    #define TRUE 1
    #define FALSE 0

    shared boolean locks[N];

    lock(me)
        int me; /* my processor number */
    {
        int i;
    
        do {
            locks[me] = TRUE;
            for ( i = 0; i < me-1 && ! locks[i]; i++ );
            if ( i < me-1 ) locks[me] = FALSE;
        } while ( ! locks[me] );
    
        do {
            for ( i = me+1; i < N && ! locks[i]; i++ )
        } while ( i < N );
    }
    
    unlock(me)
        int me; /* my processor number */
    {
        locks[me] = FALSE;
    }

Gold star to Robert Elz.  I agree that there is a great similarity
with aloha contention (and csma/cd).  In these schemes, static priorities
are replaced by random backoff times.  This eliminates static priority
but gives only a probablistic (though high) chance of success.  The
probablistic schemes are also somewhat fairer under expected network
traffic than a static scheme would be.

Note that each processor writes to only one variable and reads from
all the others.  Thus test-and-set is not needed.  I developed this
algorithm one day in deciding whether read-modify-write cycles were
an absolute requirement for a bus architecture.  Interestingly, atomic
reads and writes are also unecessary, since no processor writes the
lock belonging to any other processor.  However, glitching of a value
from true to false and back is not allows.

I refer you to 3rd PODC Conference Proceedings, ACM 1984 (reprinted
in SIGOPS OS.review Oct. 85: "Solved Problems, Unsolved Problems, and
Nonproblems in Concurrency" by Leslie Lamport.  The 2-processor version
of the above is outlined as an oft-forgotten solved problem.  Actually,
there is a bug in the algorithm as presented in the paper, but it's
probably a typo.

As Chris Torek pointed out, a token contention scheme can also be used.
This is a "fairer" scheme, but does require active participation
on the part of all parties, even when they are not interested in the
state of the lock.

- Geof Cooper

taylor@glasgow.glasgow.UUCP (Jem Taylor) (03/18/86)

In article <1828@brl-smoke.ARPA> gwyn@brl.ARPA writes:
>In article <300@imagen.UUCP> geof@imagen.UUCP (Geoffrey Cooper) writes:
>>Mutex locking among N processors can be implemented using N single bit
>>variables.  [...]
>>The algorithm works by having one variable associated with each
>>processor.  Each processor writes to its own variable and reads the
>>variables of all others.  Fairness is assumed not to be at issue.
>
>If I understand you correctly, this scheme does not work.  Consider:
>	Start with all flags clear.
>	Processor A wants to enter a critical region,
>		so it starts to set the A-flag (after
>		reading all the other processor flags
>		and finding them all clear).

Alternatively :
	Start with all flags clear.
	A sets its flag and *then* examines the other flags, finds
		them clear and enters.
	B concurrently sets it's flag and then examines the others,
		finds A set, and does not enter ...

What can B now do ? Busy wait on the other flags ... but if C also joins
in, there will be an exclusive deadlock even when A leaves the section 
and clears it's flag, since flags for B and C will still be set.

Taking a tip from Ethernet, B (and C etc) could pause for some random 
interval (after first clearing it's flag) and try again from the beginning. 
As long as B's and C's delays are not (systematically) the same, one of 
them will enter the section eventually.

Such a scheme has the provisos of good performance only under light
to moderate load, and of unfairness, that Ethernet is known for - but 
(I'm told) Ethernet works.

-Jem Taylor

				"Aerodynamically speaking, bees can't fly"

geoff@desint.UUCP (Geoff Kuenning) (03/18/86)

In article <1070@munnari.OZ> kre@munnari.OZ (Robert Elz) writes:

> What about CSMA/CD - an ethernet is a shared resource, with
> no interlocked test & set, yet it seems that its occasionally
> possible to get exclusive access to the thing...

Ethernet uses special hardware that can detect another signal during
transmission.  In effect, the exclusive access is detected after the
fact:  you get a different return code from the hardware if a
transmission was garbled by a collision, and must retry later to
get your exclusive access.  A random-timeout scheme increases the
probability that the net will be free at that later time.  The CPU
equivalent would be an instruction that interrupted if somebody else
tried to write simultaneously;  this is equivalent to a "test and set".

> 	look see if the other processor has the resource locked,
> 	if so, wait
> 
> 	set my lock bit
> 
> 	look see if the other processor has the resource locked,
> 	if so, reset my lock bit, and go back to the start
> 
> 	I own it.

This is basically Dijkstra's algorithm (see my earlier posting);  it has
been proven to be correct (if B checks A's bit before it has been set, B's
lock bit must have already been set and thus A is guaranteed to see it in
the test after A sets its lock bit).  The algorithm's only flaw is that
it can starve a processor in theory (though in practice it's rarely a
problem).  A bigger problem in multiprocessors driven off the same clock
is that it is possible for the algorithm to get into lock step and stick
both processors in the wait-for-the-other loop.  Anything that differs
between processors (interrupts, clock, DMA I/O) has the potential for
breaking this loop;  another way is to have each processor count it's
processor number down to zero before returning to the main loop.
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

jim@alberta.UUCP (Jim Easton) (03/22/86)

***

>I have never heard of a mutual exclusion scheme that did not
>at root depend on the existence of an interlocked test-and-set
>operation of some kind.  I would be interested in hearing some
>details about other schemes if they exist.

It is a sad commentary to note that someone has to go and "invent" a
special instruction for this when most of the old machines had one that
was ideally suited for the purpose and was useful for other things as well.

The instruction was "exchange memory and A (register).  The only
requirement from the hardware is that it cannot be interrupted in
mid execution - unavoidable with core memory.

I believe that key to this problem is to have only one "item" that cannot
be duplicated.  Every process makes a grab for it but the one that gets
it is the one that gets the resource.

I personally call that "item" a "ticket".  It is like a hockey ticket - the
person who has it gets to sit in the corresponding seat at the game and the
people that issue the tickets know that there cannot be collisions because
there is only one ticket (/seat/game).

Anyway that's how I think of it and if the machine has a way of arranging for
a "ticket" the algorithm will work - otherwise I wouldn't bet on it.

	Jim Easton (..!alberta!jim)