rmd@r3.cs.man.ac.uk (01/16/89)
I don't know how MIPS ensure synchronisation, but it is possible to implement a TAS instruction with a sequence of simpler instructions! As part of the MUSHROOM project we are designing a RISC style CPU with support for multi-processing. MUSHROOM does not have a synchronistation instruction, but synchronistation is maintained by clever use of the way the pipeline is resarted and a little co-operation form the interrupt handlers. When an instruction is interrupted the pipeline is normally re-started from that instruction, but in the case of the instruction in the single delay slot following a branch instruction, the pipeline is re-started at the branch instruction. This means that if the branch and its delay slot instruction are both executed they *MUST* have been executed indivisibly. This can be used to produce an indivisible test-and-store operation. What we need to do now is to ensure that no interrupts have occurred between the time the value was loaded and this test-and-store. This works something along the lines of : start: clear flag load value if flag set goto: start store new The `flag' is kept permamently in a global register. Any interrupt routines which could trample over this code must set the flag. Thus, if this sequence of instructions completes, the load and store have been carried out indivisibly. Result: one TAS operation at the cost of one cycle added to some interrupt routines, a global register and this extra bit of code instead of a single, multi-cycle, hardware TAS instruction. For the class of problems for which this processor is intended (object-oriented / symbolic processing) our figures suggest that this software solution will be more cost-effective than adding the complexity of the synchronisation mechanisms to the hardware. This scheme could suffer complications with operations such as DMA, but since in the MUSHROOM processor everything is done using interrupts we do not suffer that problem. So long and thanks for all the fish : Rhod ------------------------------------------------------------------ Rhodri M. Davies, Room 2.90, Department of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, U.K. Tel: +44-61-275 2000 ext 6149 Internal: (Owens) 6149 JANET: rmd@uk.ac.man.cs.r5 UUCP: mcvax!ukc!man.cs.r5!rmd ------------------------------------------------------------------ So long and thanks for all the fish : Rhod ------------------------------------------------------------------ Rhodri M. Davies, Room 2.90, Department of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, U.K. Tel: +44-61-275 2000 ext 6149 Internal: (Owens) 6149 JANET: rmd@uk.ac.man.cs.r5 UUCP: mcvax!ukc!man.cs.r5!rmd ------------------------------------------------------------------
mark@hubcap.UUCP (Mark Smotherman) (01/18/89)
In article <5434@ux.cs.man.ac.uk>, rmd@r3.cs.man.ac.uk writes: > ... it is possible to > implement a TAS instruction with a sequence of simpler instructions! > ... with support for multi-processing. ^^^^^^^^^^^^^^^^ From the article, it appears that multi-tasking on a single CPU is being discussed rather than multiprocessor support as was the apparent intent of the original MIPS Arch. Ques. posting. > interrupt routines which could trample over this code must set the I'm not sure I understand why TAS would be used between ISRs and user processes. If the TAS is used (ala semaphore signal) to unblock a (spinning) user process, then a simple store will suffice. If, instead, the TAS is used to implement mutual exclusion between user processes and ISRs, then a deadlock seems to occur: i.e., if the user code holds the TAS lock and an ISR is invoked, then the ISR spins on the lock and cannot return to user code to allow release of the lock. Is the purpose, then, a conditional wait by low-priority ISRs to conditionally enter a critical section? (i.e. the highest priority ISR could always assume indivisible operation and check the lock using separate load and compare-and-branch insts.) For ISR/user process mutual exclusion on a single CPU, I'm more familiar with disabling interrupts during user critical sections and low-priority ISRs. E.g. single-processor UNIX. If the TAS is used for mutual exclusion between user processes, then a fair share scheduler must be implemented, since static process priorities can lead to deadlock. Also, a blocking form of IPC seems more appropriate than spinning. I am more familiar with the use of TAS as a method of *inter- processor* synchronization for use by *disabled* OS code. E.g. see the programming notes in IBM S/370 Princ. Ops. for TAS. Thus, using both interrupt disabling and TAS, one could implement, say, semaphore wait and signal operations for a multiprocessor. So, I guess I need more info on why you would want TAS for single CPU operation. Am I missing something? Do ISRs in this case want the equivalent of a conditional semaphore wait? > ... This scheme could suffer complications with operations such as DMA, and with mutliprocessor operation! -- Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634 INTERNET: mark@hubcap.clemson.edu UUCP: gatech!hubcap!mark
brooks@vette.llnl.gov (Eugene Brooks) (01/18/89)
In article <4124@hubcap.UUCP> mark@hubcap.UUCP (Mark Smotherman) writes: >In article <5434@ux.cs.man.ac.uk>, rmd@r3.cs.man.ac.uk writes: >> ... it is possible to >> implement a TAS instruction with a sequence of simpler instructions! >> ... with support for multi-processing. > ^^^^^^^^^^^^^^^^ > From the article, it appears that multi-tasking on a single CPU > is being discussed rather than multiprocessor support as was the > apparent intent of the original MIPS Arch. Ques. posting. Although it is true that the article in question was not talking true multiprocessor support, it is possible to establish protection of a critical region with only read and write operations. It takes 5 memory operations, except when a collision occurs between processors in which it takes a scan of an array having an element for each processor. The algorithm can be quite competitive (in the absence of collisions) on systems where the special "test and set" instructions run slower due to implementation hardware choices. The algorithm, due to Leslie Lamport, breaks even with Sequent's Atomic Lock Memory used on their Balance series, but is slower than the indivisible memory operations supported on their Symmetry series. A multiprocessor using the MIPS chip does not have much of a choice, one can either use Lamport's algorithm or implement the equivalent of Atomic Lock Memory. If the Atomic Lock Memory is implemented with a slow bus interface Lamport's algorithm may be faster. A more important synchronization primitive, barrier synchronization, is best done with an algorithm using read and write to shared memory and not locked counters, in any event. There are quite of few tricky synchronization tricks which use read and write only, and some of them are the "best" way to get the particular synchronization done.