[comp.arch] Synchronisation

rmd@r3.cs.man.ac.uk (01/16/89)

I don't know how MIPS ensure synchronisation, but it is possible to
implement a TAS instruction with a sequence of simpler instructions!

  As part of the MUSHROOM project we are designing a RISC style CPU
with support for multi-processing.  MUSHROOM does not have a
synchronistation instruction, but synchronistation is maintained by
clever use of the way the pipeline is resarted and a little
co-operation form the interrupt handlers.

  When an instruction is interrupted the pipeline is normally
re-started from that instruction, but in the case of the instruction
in the single delay slot following a branch instruction, the pipeline
is re-started at the branch instruction. This means that if the branch
and its delay slot instruction are both executed they *MUST* have been
executed indivisibly. This can be used to produce an indivisible
test-and-store operation.

  What we need to do now is to ensure that no interrupts have occurred
between the time the value was loaded and this test-and-store. This
works something along the lines of :

   start:	clear flag
	   	load value
	   	if flag set goto: start
	   	store new 
  The `flag' is kept permamently in a global register.  Any
interrupt routines which could trample over this code must set the
flag. Thus, if this sequence of instructions completes, the load and
store have been carried out indivisibly.

  Result: one TAS operation at the cost of one cycle added to some
interrupt routines, a global register and this extra bit of code
instead of a single, multi-cycle, hardware TAS instruction. For the
class of problems for which this processor is intended
(object-oriented / symbolic processing) our figures suggest that this
software solution will be more cost-effective than adding the
complexity of the synchronisation mechanisms to the hardware.

  This scheme could suffer complications with operations such as DMA,
but since in the MUSHROOM processor everything is done using
interrupts we do not suffer that problem.

So long and thanks for all the fish :
					Rhod

------------------------------------------------------------------
Rhodri M. Davies,    Room 2.90,    Department of Computer Science,
University of Manchester,  Oxford Road,  Manchester, M13 9PL, U.K.
Tel: +44-61-275 2000 ext 6149		    Internal: (Owens) 6149
JANET: rmd@uk.ac.man.cs.r5           UUCP: mcvax!ukc!man.cs.r5!rmd
------------------------------------------------------------------

So long and thanks for all the fish :
					Rhod

------------------------------------------------------------------
Rhodri M. Davies,    Room 2.90,    Department of Computer Science,
University of Manchester,  Oxford Road,  Manchester, M13 9PL, U.K.
Tel: +44-61-275 2000 ext 6149		    Internal: (Owens) 6149
JANET: rmd@uk.ac.man.cs.r5           UUCP: mcvax!ukc!man.cs.r5!rmd
------------------------------------------------------------------

mark@hubcap.UUCP (Mark Smotherman) (01/18/89)

In article <5434@ux.cs.man.ac.uk>, rmd@r3.cs.man.ac.uk writes:
> ... it is possible to
> implement a TAS instruction with a sequence of simpler instructions!
> ... with support for multi-processing.
                       ^^^^^^^^^^^^^^^^
	From the article, it appears that multi-tasking on a single CPU
	is being discussed rather than multiprocessor support as was the
	apparent intent of the original MIPS Arch. Ques. posting.

> interrupt routines which could trample over this code must set the

	I'm not sure I understand why TAS would be used between ISRs
	and user processes.  If the TAS is used (ala semaphore signal) to
	unblock a (spinning) user process, then a simple store will
	suffice.  If, instead, the TAS is used to implement mutual
	exclusion between user processes and ISRs, then a deadlock seems
	to occur: i.e., if the user code holds the TAS lock and an ISR
	is invoked, then the ISR spins on the lock and cannot return
	to user code to allow release of the lock.

	Is the purpose, then, a conditional wait by low-priority ISRs to
	conditionally enter a critical section?  (i.e. the highest
	priority ISR could always assume indivisible operation and
	check the lock using separate load and compare-and-branch insts.)

	For ISR/user process mutual exclusion on a single CPU, I'm more
	familiar with disabling interrupts during user critical sections
	and low-priority ISRs.  E.g. single-processor UNIX.

	If the TAS is used for mutual exclusion between user processes,
	then a fair share scheduler must be implemented, since static
	process priorities can lead to deadlock.  Also, a blocking form
	of IPC seems more appropriate than spinning.

	I am more familiar with the use of TAS as a method of *inter-
	processor* synchronization for use by *disabled* OS code.  E.g.
	see the programming notes in IBM S/370 Princ. Ops. for TAS.
	Thus, using both interrupt disabling and TAS, one could implement,
	say, semaphore wait and signal operations for a multiprocessor.

	So, I guess I need more info on why you would want TAS for single
	CPU operation.  Am I missing something?  Do ISRs in this case
	want the equivalent of a conditional semaphore wait?

> ...  This scheme could suffer complications with operations such as DMA,

	and with mutliprocessor operation!
-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark

brooks@vette.llnl.gov (Eugene Brooks) (01/18/89)

In article <4124@hubcap.UUCP> mark@hubcap.UUCP (Mark Smotherman) writes:
>In article <5434@ux.cs.man.ac.uk>, rmd@r3.cs.man.ac.uk writes:
>> ... it is possible to
>> implement a TAS instruction with a sequence of simpler instructions!
>> ... with support for multi-processing.
>                       ^^^^^^^^^^^^^^^^
>	From the article, it appears that multi-tasking on a single CPU
>	is being discussed rather than multiprocessor support as was the
>	apparent intent of the original MIPS Arch. Ques. posting.

Although it is true that the article in question was not talking true
multiprocessor support, it is possible to establish protection of a
critical region with only read and write operations.  It takes 5 memory
operations, except when a collision occurs between processors in which
it takes a scan of an array having an element for each processor. The algorithm
can be quite competitive (in the absence of collisions) on systems where
the special "test and set" instructions run slower due to implementation
hardware choices.  The algorithm, due to Leslie Lamport, breaks even with
Sequent's Atomic Lock Memory used on their Balance series, but is slower than
the indivisible memory operations supported on their Symmetry series.  A
multiprocessor using the MIPS chip does not have much of a choice,
one can either use Lamport's algorithm or implement the equivalent of
Atomic Lock Memory.  If the Atomic Lock Memory is implemented with a slow
bus interface Lamport's algorithm may be faster.

A more important synchronization primitive, barrier synchronization, is
best done with an algorithm using read and write to shared memory and not
locked counters, in any event.  There are quite of few tricky synchronization
tricks which use read and write only, and some of them are the "best" way
to get the particular synchronization done.