[comp.realtime] Spin locks

david@marvin.jpl.oz (David Magnay) (01/22/91)

Can someone pls explain spinlocks. Are these a form of semaphore ?

Thanks

..........................................................
"You learn a lot as you go thru life,
 but not usually what you expected"
..........................................................
David Magnay				mail	david@marvin.jpl.oz
Boral Elevators				was: Johns Perry Lifts
45 Wangara Road, Cheltenham 3192	phone	(03) 584-3311
Victoria, Australia			O/seas	+61 3 584 3311

ewoods@hemel.bull.co.uk (Eoin Woods) (01/22/91)

david@marvin.jpl.oz (David Magnay) writes:

>Can someone pls explain spinlocks. Are these a form of semaphore ?

Hmm, interesting question!
Basically yes.

The most simple spin-lock is a piece of code which would look something like:

	while (locked)
	   check_lock (lockA) ;

The essential thing about a spin-lock (if I can remember my real-time systems
courses) is that the process concerned loops until the lock it is waiting
for is released (and so can be inefficient in a real-time environment as
it uses resources and is stuck until the lock is released for it).

A good reference for this sort of information is :

	"Real-Time Systems Design", 2nd ed, Allworth and Zobel, Macmillan.

Eoin.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~    Eoin Woods, Software Development Group, Bull HN Information Systems,   ~
~                Maxted Road, Hemel Hempstead, Herts HP2 7DZ, UK.           ~
~                Tel : +44 442 232222 x4823   Fax : +44 442 234084          ~
~      < Eoin.Woods@hemel.bull.co.uk  or   ...!uunet!ukc!brno!ewoods>       ~
~          < When do we start news group comp.os.emacs ?  :-) >             ~

prent@interc.UUCP (01/24/91)

In the Concurrent multiprocessor implementations, as in a number of
others, the spinlock technique is almost perfectly efficient due to
the hardware architecture.  The presumption is only that the
checking processor really has nothing better to do than wait for the
resource being locked.  Given this, then the locking processor locks
the lock with a single instruction, no OS overhead, etc.  The
checking processor fetches the lock (once only) from shared memory...
thereafter (while "spinning"), it is merely looping in its own cache,
with no resource toll being made on memory hardware.  The eventual
unlocking of the lock causes the cache-invalid signal to be generated
and (a single) subsequent real-memory read.  Net memory traffic is
one read if originally unlocked, two otherwise.  Cheap, and the
requesting processor can go about his business within two memory
cycles of the resource becoming available, or less than many single-
instruction execute times.
 
Compute in Good Health (and that Quickly),
Prentiss Mooney

leong@interc.UUCP (01/25/91)

I don't know about you guys out there, but I consider CPU cycles
extremely precious.  First of all, spinlocks seems to be an
acceptable KLUGE for small applications, but I sure do not want
to design my Real-Time -- Flight Criticle -- Life Saving -- System
with the initial flaws of techniques such as spinlocks!

I guess in the scenerious of shared-memory multisystem environments
might warrant a need for something like spinlocks, but I would guess
that there are other means of providing synchronizations whether it
be through additional software messaging or hardware signalling.

By today's real-time standards of microsecond context switching, I
find it hard that designers would consider spinlocks.

Leong

dcousins@bbn.com (Dave Cousins) (01/30/91)

fox@rudolf.nscl.msu.edu (Ron Fox) writes:

>--
>operation, and you need some external thing (either an event, included
>here is a pre-empting process), or another CPU to break the loop.

I use spin locks on a shared memory architecture, basically because
the interrupt processing on our system was not fast enough to use as
a semaphore. Spin locks have low latency, especially when the system
releasing the lock (i.e. writing the lock bit) does so by cycle-stealling
from the polling cpu.

Dave Cousins

brucej@verdix.com (Bruce Jones) (01/31/91)

In article <3600010@interc> leong@interc.UUCP writes:
>I don't know about you guys out there, but I consider CPU cycles
>extremely precious.  First of all, spinlocks seems to be an
>acceptable KLUGE for small applications, but I sure do not want
>to design my Real-Time -- Flight Criticle -- Life Saving -- System
>with the initial flaws of techniques such as spinlocks!
	[paragraph deleted...]
>By today's real-time standards of microsecond context switching, I
>find it hard that designers would consider spinlocks.

Spin locks are used on shared memory multiprocessors _because_ CPU
cycles are precious,  not because designers like kludges.

Used properly,  spin locks are extremely efficient and completely
safe.  Used improperly, they are extremely dangerous,  as you have 
pointed out.

Consider a shared resource protected by a "sleepy" lock such as AT&T
UNIX semaphores.  When there is contention for the resource,  the
cost of gaining it includes the cost of executing a system call,
the cost of the time the kernal spends putting the processor
to sleep and waking it up again and the cost of at least two context
switches.

Then change the sleepy lock into a spin lock.  Now the cost of gaining
the lock is the time spent spinning.  This time depends on the spin
lock implementation,  the amount of contention,  and the length of
time the lock is held by a process once gained.  When the lock is
held for only a short length of time and contention is low,  spin locks
are a huge win over sleepy locks.

Measure the cost of a system call and a few context switches on
your system.  Better yet,  meausre the cost of an SysV semaphore
call! :)  Then measure the time needed to execute one or two or six
test-and-set instructions.  On a 68030 the ratio is likely to be
almost 500 to 1.

Anderson has published an in-depth study of spin lock implementation
and contention issues in IEEE Software Engineering.  There were some
rather subtle and suprising results arising from shared memory caches,
bus contention,  and the like.  He was able to demonstrate effective
spin lock algorithms for a wide range of environments.  He clearly
explained subtle flaws in naive implementations that can cause problems
in high contention SMMP machines.

There are several dangers inherent in spin locks.  The foremost is the
danger of uncontrolled spinning,  if the owner of a lock is aborted or
exits or is simply context switched out.  This is why spin locks are not 
used in user programs running under UNIX,  but are used extensively in MP
implementations of UNIX.  The operating system can protect itself,  by
preventing context switches,  masking interrupts and the like,  but
in general these capabilities are not available to user programs.

My friend Mark Heuser presented a paper at the last Usenix on his
implementation of safe,  reliable and efficient spin locks for user
programs under UNIX.  He found that a user application and the OS
can cooperate to produce safe spin locks with performance far exceeding
typical context switch times.  Without system calls or context switches.
His work is awesome,  and I'm sorry I don't work for that company anymore...

I'm sending this off from home,  and my references are at work.  I will
be glad to email you bibliography entries if you drop me some email.

	-- brucej

--------------------------------
Bruce Jones
Verdix Corporation,  Aloha OR
brucej@verdix.com

chris@yarra.oz.au (Chris Jankowski) (01/31/91)

In article <3600010@interc> leong@interc.UUCP writes:
>
> I don't know about you guys out there, but I consider CPU cycles
> extremely precious.  First of all, spinlocks seems to be an
> acceptable KLUGE for small applications, but I sure do not want
> to design my Real-Time -- Flight Criticle -- Life Saving -- System
> with the initial flaws of techniques such as spinlocks!

In multiprocessor (shared memory on a bus) systems CPU cycles are cheap 
compared to eg. cost of accessing memory. That changes the whole perspective.
Moreover cost of processors is falling and they are becoming faster and
faster therefore imposing higher and higher load on memory and *bus*.
I am talking here about *single* systems in the sense that they run
a single copy of an operating system.
>
> By today's real-time standards of microsecond context switching, I
> find it hard that designers would consider spinlocks.
>
For systems mentioned above spinlocks are the basic and cheapest
tool used within OS to lock critical sections of OS code.
This works fine. Best implementations provide efficiency of about 97%
for say 8 processor systems on typical mixture of tasks.

The same concept may be useful outside the OS area eg for real-time
applications.
But again you better have lots of processors. 
Definitely more than one.
And of course it only makes sense to use spinlocks to synchronise events
with finite and low waiting time.

Again we have a tradeoff:

Spinlocks provide low cost, low latency synchronisation mechanism
at a price of burning CPU cycles.

      -m-------   Chris Jankowski - Senior Systems Engineer   chris@yarra.oz.au
    ---mmm-----   Pyramid Technology Corporation Pty. Ltd.  fax  +61 3 521 3799
  -----mmmmm---   1st Floor, 553 St. Kilda Road             tel. +61 3 525 1730
-------mmmmmmm-   Melbourne, Victoria, 3004       AUSTRALIA       (03) 521 3799

``If you're crossing the nation in a covered wagon, it's better to have four
  strong oxen than 100 chickens.  Chickens are OK but we can't make them work 
  together yet.'' - Ross Bott, Pyramid U.S., on multiprocessors at AUUGM '89.

vestal@SRC.Honeywell.COM (Steve Vestal) (01/31/91)

In article <3600010@interc> leong@interc.UUCP writes:

>> acceptable KLUGE for small applications, but I sure do not want
>> to design my Real-Time -- Flight Criticle -- Life Saving -- System
>> with the initial flaws of techniques such as spinlocks!

It's not clear to me that spin-locks should be completely ruled out for
reliable real-time systems.  Essentially, multiple processes contending for
the same lock are queued using a random discipline.  I admit this is not as
conceptually nice as being queued using some priority discipline.  However, in
many real-time systems the maximum number of processes that might ever be
queued on a semaphore can be bounded (by some small number).  It would seem
that the total blocking time for any of these processes could be bounded by the
sum of the critical section execution times (plus some delta for the spin-lock
timing).  This may well be a larger blocking time for many of these processes
than could be achieved using, say, a priority inheritance protocol.  Except,
of course, where the priority inheritance lock/unlock times are large relative
to the sum of the critical section and spin-lock times.

Steve Vestal
Mail: Honeywell S&RC MN65-2100, 3660 Technology Drive, Minneapolis MN 55418 
Phone: (612) 782-7049                    Internet: vestal@src.honeywell.com

fox@rudolf.nscl.msu.edu (Ron Fox) (01/31/91)

--
In article <3600009@interc>, prent@interc.UUCP writes:
>Path: >
>
>In the Concurrent multiprocessor implementations, as in a number of
>others, the spinlock technique is almost perfectly efficient due to
>the hardware architecture.  The presumption is only that the
>checking processor really has nothing better to do than wait for the
>resource being locked. 

  This is the key assumption and depending on what you use spinlocks
for it may
or may not be true.  To me a place to use spin locks would be in the
O/S to lock shared
resources such as the process queues, or distributed O/S tables. The
place in this
architecture *NOT* to use spinlocks would be in process level IPC's
because during
the spin, you could get better system wide throughput if you could
schedule someone else to run while waiting for the lock to complete. 
Of course this should not be taken as a blanket statement.  If the
cooperating applications *knows* things about it's partners, or has
special timing requirements for the latency of detecting the release
of the lock (this is comp.realtime after all), then all bets are off
and the actual processor usage profile is less important than the
timing requirements of the application.

> [coherent caching discussion elminated here.. ]

	Ron

Ron Fox                     | FOX@MSUNSCL.BITNET      | Where the name 
NSCL                        | FOX@RUDOLF.NSCL.MSU.EDU | goes on before
Michigan State University   | MSUHEP::RUDOLF::FOX     | the quality
East Lansing, MI 48824-1321 |                         | goes in.
USA

ikw@genrad.UUCP (Isaac K. Wong) (02/01/91)

In article <68639@yarra.oz.au> chris@yarra.oz.au writes:
>
>For systems mentioned above spinlocks are the basic and cheapest
>tool used within OS to lock critical sections of OS code.
>This works fine. Best implementations provide efficiency of about 97%
>for say 8 processor systems on typical mixture of tasks.
>
>The same concept may be useful outside the OS area eg for real-time
>applications.
>But again you better have lots of processors.
>Definitely more than one.


I agree. Spin locks work. They work especially well in multi-processor
shared-memory systems running no OS. I implemented such scheme in one of
my projects here. It's a multi-channel real time signal analysis system
utilizing multiple processor boards and shared memory. As some other netters
have pointed out, when one processor board is looping over a spin lock,
it has no better things to do!

--
Isaac Wong
GenRad, Structural Products Div.
Milpitas, CA
...!sun!topo!wongi   or wongi@topo.uucp

jat@xavax.com (John Tamplin) (02/03/91)

Spin locks are quite useful in multiprocessor systems, at least in terms
of synchronization between OS's on different processors accessing common
data structures.

However, it is still not perfect.  If n processors are accessing some
shared data structure simultaneously, 1 will get it.  The others will
spin on local copies in their respective caches.  When the lock is released,
the lock gets ping-ponged between the caches with one getting it again.  This
costs n-1 bus cycles moving ownership of the cache line between the processors.
You do it again next time, and so on, until the last processor wins the lock.
Granted, this is worst case, but you can still have O(n^2) bus cycles to get
the lock to all processors.  Additionally, there is only a probabilistic
guarantee of fairness.

If the resources are available, I think hardware semaphores are a much
neater solution.  With the gap between processor and memory speeds increasing,
it is reasonable to assume that large-scale multiprocessors will use some
sort of "request packet" structure (ie, the processor sends a message "read
location 20 and send me the result" and blocks until it gets a response).
With this structure, it would be easy to implement hardware semaphores
that queue requests and send a response to a processor when it owns the
semaphore.  With the depths available in FIFOs, you won't have to worry about
a limited queue, and the interface could be as simple as:

	dummy=[sem0]	/* block until lock granted */
	... critical section ...
	[sem0]=dummy	/* release lock */

This solution guarantees O(n) bus cycles (cycle in this case being a request
packet and a response packet) to solve the same problem as before, which is
optimal.  Also, you can enforce whatever kind of fairness you desire, even
giving certain processors higher priority and ordering the queue by that
priority.  None of this requires any changes to the processor itself, although
the bus interface would have to already support a disconnect-style bus.
-- 
John Tamplin						Xavax
jat@xavax.COM						2104 West Ferry Way
...!uunet!xavax!jat					Huntsville, AL 35801