[comp.arch] The Bus Abandonment Lock

aglew@oberon.crhc.uiuc.edu (Andy Glew) (07/24/90)
                  The Bus Abandonment Lock


1.  What this is
-   ---- ---- --

In  my  quest  for  interesting  things  to  talk  about  on
comp.arch, to get us out of the vfork wars, I thought that I
might talk in an informal manner about some of the work that
I am doing with my advisor, Wen-Mei Hwu.


2.  Summary
-   -------

The burst of bus transactions on transfer of a spinlock  can
be eliminated by supporting the ability to abandon a pending
bus request, when bus traffic  is  observed  that  makes  it
unnecessary.   This  gives performance comparable to that of
the Graunke and Thakkar xmem queuing lock, in a manner  that
is more independent of the instruction set architecture, and
permits more flexible software implementations.


3.  Background
-   ----------

comp.arch readers should not need to have the importance  of
synchronization  operations  explained  to them.  The excess
bus traffic that can be produced by  a  simple  test-and-set
spinlock  is  well-known.   Also well known is the test-and-
test-and-set principle, introduced by Rudolph and Segall, of
spinning  on a cached copy and only going to the bus with an
interlocked atomic operation when the lock is observed to be
free.   Anderson  et al [SIGMETRICS 89] point out that test-
and-test-and-set spinlocks produce a burst  of  bus  traffic
when  a lock is released that is being waited for by several
processors.  Several authors propose a mixture  of  hardware
and software solutions for this burst.  Probably the best is
the Graunke and Thakkar xmem queue lock, described  in  IEEE
COMPUTER  magazine.   Although  it eliminates the burst, the
Graunke  queue  lock  has  several  problems  which  can  be
remedied by the bus abandonment lock described here.

The rest of this post describes how we repeated the measure-
ments  described  in  Graunke  and  Thakkar, with a software
simulator that allows  us  to  change  bus  parameters  more
easily  than  they  could, including in the comparison a new
style of lock, the bus abandonment lock, that has  the  same
performance  as  the  Graunke  queue lock but which does not
have its problems.


4.  Graunke Queue Lock
-   ------- ----- ----

Read the COMPUTER article for a description of  the  Graunke
queue  lock.   Although  a wonderful software only solution,
the Graunke  queue  lock  has  the  following  problems:  It
requires an atomic exchange-with-memory operation, which not
all processors have.  This, it is instruction set dependent.
It  is  vulnerable to preemption.  It is difficult to imple-
ment lock query operations, or to decide  that  you  do  not
want  to  wait  for  a  lock  after you have been put on the
queue.


5.  The Bus Abandonment Lock
-   --- --- ----------- ----

5.1.  Analysis
- -   --------

Scenario analysis for the burst of  bus  transactions  on  a
lock  transfer  shows  that the problem is the race from the
test- of the lock in cache, to the atomic test-and-set  per-
formed  on  the  bus.  Many processors pass the test- at the
same time, and proceed  to  issue  test-and-set  operations.
All of the test-and-set operations proceed one-by-one to the
bus, causing the burst.

If the pending bus test-and-set bus requests  can  be  aban-
doned  when  the  first  of  them goes through, the burst is
eliminated.

(Yes, this is just Ultracomputer style  combining,  although
we avoid the term combining because it has become a bit of a
dirty word among systems engineers).

Providing bus abandonment gives a  spinlock  much  like  the
old,  familiar,  spinlock,  without the Graunke queue lock's
problems (aforementioned and otherwise).


5.2.  Implementation
- -   --------------

On first realizing that only this single feature, abandoning
pending  bus requests, was the key to eliminating this burst
of bus transactions that so many people were going  to  such
lengths to avoid, we imagined that implementing bus abandon-
ment must be hard.  Actually, many existing  bus  protocols,
ranging  from  the Amiga Zorro III through the VMEbus and up
to FUTUREbus+, support abandoning bus requests in  some  way
or  other.  Moreover, several processors already abandon bus
requests like  read  or  write  when  they  have  been  made
unnecessary  (eg.  they may snarf the read data off the bus,
or cancel writes).

Abandonment has just not been  applied  to  synchronization,
though.


5.3.  Simple Experiments
- -   ------ -----------

We built a simple simulator, "synchsim", to  evaluate  these
synchronization  algorithms.   We used the same benchmark, a
simple multiprocessor increment a counter in a critical sec-
tion,  as Graunke and Thakkar.  Our simulator lets us change
bus scheduling algorithms and timing, something that is dif-
ficult to do on real hardware.

Like Graunke and Thakkar,  a  simple  test-and-set  spinloop
that  goes  to  the bus on every iteration is the worst syn-
chronization algorithm in terms of bus traffic.  However, we
found  the total number of bus transactions required to per-
form the benchmark to be linear in the number of processors,
not  squared, as they mention off-hand.  We describe this in
more detail below.

A test-and-test-and-set  spinloop  is  also  linear  in  the
number of processors, due to the burst.

We found Graunke and Thakkar's queue lock to require a  con-
stant  amount of bus traffic, for multiple processors (actu-
ally, we found a small linear factor, but it is  practically
negligible).

By itself, a test-and-test-and-set style spinlock using  bus
abandonment  also requires a constant amount of bus traffic,
but slightly more than Graunke's queue  algorithm,  However,
if  you add the ability to check if a lock is present in the
cache first, and if you broadcast the writes  of  the  lock,
both  on lock release and when the lock is acquired, the bus
abandonment  lock  is  better  than  Graunke's  queue  lock.
Finally,  if your bus supports a special test-and-set opera-
tion, like Encore's Nanobus, bus abandonment provides a lock
that is considerably better than Graunke's.

(As a check, the special synchronization  operation  without
abandonment produces linear bus traffic).

The quantitative results of these experiments are available,
but  we  didn't  include them here because graphs don't look
good in ascii.

Thus, the performance of the bus abandonment  lock  is  con-
stant  in the number of processors, and is basically identi-
cal to Graunke's queue lock.  However, because we still have
the  classical  test-and-set  spinlock,  the bus abandonment
lock does not suffer  the  software  problems  of  Graunke's
lock.   Moreover,  the bus abandonment lock technique can be
applied to any existing  interlocked  atomic  operation;  it
does not require an exchange-with-memory operation operation
in the instruction set,  although  bus  abandonment  can  be
applied to xmem if desired.


5.3.1.  Worst Case
- - -   ----- ----

As mentioned above, there appears to be a bit  of  confusion
as to the nature of the the worst case burst of bus transac-
tions on a lock transfer.  It is not  that  test-and-set  is
O(n^2),  and  test-and-test-and-set  is O(n).  Indeed, test-
and-set cannot really be said to have a "burst",  since  its
bus traffic is continuous.

Actually, it is the type of bus scheduling that changes  the
order  of  the  burst.   we  demonstrate  this with the same
benchmark as before, except with a delay added so  that  the
burst  will  quiet  down, using a test-and-test-and-set FIFO
scheduling provides a burst that is linear in the number  of
processors,  as  does  pure FIXED priority scheduling.  How-
ever, FIXED priority bus scheduling, with variations such as
broadcasting  the release of the lock (snarfing in the first
test-) can produce O(n^2) bus traffic in the burst  that  is
much  worse.   The  same thing can happen with variations of
other bus scheduling policies.

Moral: when considering a bus  scheduling  policy,  consider
the effect on test-and-test-and-set spinloop bus traffic (or
ensure that TTS isn't used in your system).


5.4.  Further Work
- -   ------- ----

The  conventional  way  of  integrating  interlocked  atomic
operations with the cache is to have the first read make the
data exclusive.  This is what Sequent does.  Using bus aban-
donment  means that you can use a shared lock, by initiating
a bus write to lock the data, abandoning the write  if  some
other processor beats you to it.



6.  Conclusion
-   ----------

If you are implementing spinlocks  on  an  existing  machine
that supports an exchange-with-memory operation consider the
Graunke queue lock.

If you are implementing a new machine, consider either  sup-
porting  xmem, so the Graunke queue lock can be implemented,
or consider supporting the ability to  abandon  pending  bus
requests.

If conditional locks or multiple lock spinning is  important
to you, seriously consider the bus abandonment lock.

For OS applications (or systems where the user  can  control
preemption): if preemption while waiting for the lock, or in
the critical section, is a potential problem, then  the  bus
abandonment lock is preferable to the Graunke queue lock.

--
Andy Glew, andy-glew@uiuc.edu

Propaganda:
    
    UIUC runs the "ph" nameserver in conjunction with email. You can
    reach me at many reasonable combinations of my name and nicknames,
    including:

    	andrew-forsyth-glew@uiuc.edu
    	andy-glew@uiuc.edu
    	sticky-glue@uiuc.edu

    and a few others. "ph" is a very nice thing which more USEnet
    sites should use. There is an info-ph mailing list, or, contact
    Steve-Dorner@uiuc.edu. UIUC has ph wired into email and whois.