aglew@oberon.crhc.uiuc.edu (Andy Glew) (07/24/90)
The Bus Abandonment Lock
1. What this is
- ---- ---- --
In my quest for interesting things to talk about on
comp.arch, to get us out of the vfork wars, I thought that I
might talk in an informal manner about some of the work that
I am doing with my advisor, Wen-Mei Hwu.
2. Summary
- -------
The burst of bus transactions on transfer of a spinlock can
be eliminated by supporting the ability to abandon a pending
bus request, when bus traffic is observed that makes it
unnecessary. This gives performance comparable to that of
the Graunke and Thakkar xmem queuing lock, in a manner that
is more independent of the instruction set architecture, and
permits more flexible software implementations.
3. Background
- ----------
comp.arch readers should not need to have the importance of
synchronization operations explained to them. The excess
bus traffic that can be produced by a simple test-and-set
spinlock is well-known. Also well known is the test-and-
test-and-set principle, introduced by Rudolph and Segall, of
spinning on a cached copy and only going to the bus with an
interlocked atomic operation when the lock is observed to be
free. Anderson et al [SIGMETRICS 89] point out that test-
and-test-and-set spinlocks produce a burst of bus traffic
when a lock is released that is being waited for by several
processors. Several authors propose a mixture of hardware
and software solutions for this burst. Probably the best is
the Graunke and Thakkar xmem queue lock, described in IEEE
COMPUTER magazine. Although it eliminates the burst, the
Graunke queue lock has several problems which can be
remedied by the bus abandonment lock described here.
The rest of this post describes how we repeated the measure-
ments described in Graunke and Thakkar, with a software
simulator that allows us to change bus parameters more
easily than they could, including in the comparison a new
style of lock, the bus abandonment lock, that has the same
performance as the Graunke queue lock but which does not
have its problems.
4. Graunke Queue Lock
- ------- ----- ----
Read the COMPUTER article for a description of the Graunke
queue lock. Although a wonderful software only solution,
the Graunke queue lock has the following problems: It
requires an atomic exchange-with-memory operation, which not
all processors have. This, it is instruction set dependent.
It is vulnerable to preemption. It is difficult to imple-
ment lock query operations, or to decide that you do not
want to wait for a lock after you have been put on the
queue.
5. The Bus Abandonment Lock
- --- --- ----------- ----
5.1. Analysis
- - --------
Scenario analysis for the burst of bus transactions on a
lock transfer shows that the problem is the race from the
test- of the lock in cache, to the atomic test-and-set per-
formed on the bus. Many processors pass the test- at the
same time, and proceed to issue test-and-set operations.
All of the test-and-set operations proceed one-by-one to the
bus, causing the burst.
If the pending bus test-and-set bus requests can be aban-
doned when the first of them goes through, the burst is
eliminated.
(Yes, this is just Ultracomputer style combining, although
we avoid the term combining because it has become a bit of a
dirty word among systems engineers).
Providing bus abandonment gives a spinlock much like the
old, familiar, spinlock, without the Graunke queue lock's
problems (aforementioned and otherwise).
5.2. Implementation
- - --------------
On first realizing that only this single feature, abandoning
pending bus requests, was the key to eliminating this burst
of bus transactions that so many people were going to such
lengths to avoid, we imagined that implementing bus abandon-
ment must be hard. Actually, many existing bus protocols,
ranging from the Amiga Zorro III through the VMEbus and up
to FUTUREbus+, support abandoning bus requests in some way
or other. Moreover, several processors already abandon bus
requests like read or write when they have been made
unnecessary (eg. they may snarf the read data off the bus,
or cancel writes).
Abandonment has just not been applied to synchronization,
though.
5.3. Simple Experiments
- - ------ -----------
We built a simple simulator, "synchsim", to evaluate these
synchronization algorithms. We used the same benchmark, a
simple multiprocessor increment a counter in a critical sec-
tion, as Graunke and Thakkar. Our simulator lets us change
bus scheduling algorithms and timing, something that is dif-
ficult to do on real hardware.
Like Graunke and Thakkar, a simple test-and-set spinloop
that goes to the bus on every iteration is the worst syn-
chronization algorithm in terms of bus traffic. However, we
found the total number of bus transactions required to per-
form the benchmark to be linear in the number of processors,
not squared, as they mention off-hand. We describe this in
more detail below.
A test-and-test-and-set spinloop is also linear in the
number of processors, due to the burst.
We found Graunke and Thakkar's queue lock to require a con-
stant amount of bus traffic, for multiple processors (actu-
ally, we found a small linear factor, but it is practically
negligible).
By itself, a test-and-test-and-set style spinlock using bus
abandonment also requires a constant amount of bus traffic,
but slightly more than Graunke's queue algorithm, However,
if you add the ability to check if a lock is present in the
cache first, and if you broadcast the writes of the lock,
both on lock release and when the lock is acquired, the bus
abandonment lock is better than Graunke's queue lock.
Finally, if your bus supports a special test-and-set opera-
tion, like Encore's Nanobus, bus abandonment provides a lock
that is considerably better than Graunke's.
(As a check, the special synchronization operation without
abandonment produces linear bus traffic).
The quantitative results of these experiments are available,
but we didn't include them here because graphs don't look
good in ascii.
Thus, the performance of the bus abandonment lock is con-
stant in the number of processors, and is basically identi-
cal to Graunke's queue lock. However, because we still have
the classical test-and-set spinlock, the bus abandonment
lock does not suffer the software problems of Graunke's
lock. Moreover, the bus abandonment lock technique can be
applied to any existing interlocked atomic operation; it
does not require an exchange-with-memory operation operation
in the instruction set, although bus abandonment can be
applied to xmem if desired.
5.3.1. Worst Case
- - - ----- ----
As mentioned above, there appears to be a bit of confusion
as to the nature of the the worst case burst of bus transac-
tions on a lock transfer. It is not that test-and-set is
O(n^2), and test-and-test-and-set is O(n). Indeed, test-
and-set cannot really be said to have a "burst", since its
bus traffic is continuous.
Actually, it is the type of bus scheduling that changes the
order of the burst. we demonstrate this with the same
benchmark as before, except with a delay added so that the
burst will quiet down, using a test-and-test-and-set FIFO
scheduling provides a burst that is linear in the number of
processors, as does pure FIXED priority scheduling. How-
ever, FIXED priority bus scheduling, with variations such as
broadcasting the release of the lock (snarfing in the first
test-) can produce O(n^2) bus traffic in the burst that is
much worse. The same thing can happen with variations of
other bus scheduling policies.
Moral: when considering a bus scheduling policy, consider
the effect on test-and-test-and-set spinloop bus traffic (or
ensure that TTS isn't used in your system).
5.4. Further Work
- - ------- ----
The conventional way of integrating interlocked atomic
operations with the cache is to have the first read make the
data exclusive. This is what Sequent does. Using bus aban-
donment means that you can use a shared lock, by initiating
a bus write to lock the data, abandoning the write if some
other processor beats you to it.
6. Conclusion
- ----------
If you are implementing spinlocks on an existing machine
that supports an exchange-with-memory operation consider the
Graunke queue lock.
If you are implementing a new machine, consider either sup-
porting xmem, so the Graunke queue lock can be implemented,
or consider supporting the ability to abandon pending bus
requests.
If conditional locks or multiple lock spinning is important
to you, seriously consider the bus abandonment lock.
For OS applications (or systems where the user can control
preemption): if preemption while waiting for the lock, or in
the critical section, is a potential problem, then the bus
abandonment lock is preferable to the Graunke queue lock.
--
Andy Glew, andy-glew@uiuc.edu
Propaganda:
UIUC runs the "ph" nameserver in conjunction with email. You can
reach me at many reasonable combinations of my name and nicknames,
including:
andrew-forsyth-glew@uiuc.edu
andy-glew@uiuc.edu
sticky-glue@uiuc.edu
and a few others. "ph" is a very nice thing which more USEnet
sites should use. There is an info-ph mailing list, or, contact
Steve-Dorner@uiuc.edu. UIUC has ph wired into email and whois.