aglew@oberon.crhc.uiuc.edu (Andy Glew) (07/24/90)
The Bus Abandonment Lock 1. What this is - ---- ---- -- In my quest for interesting things to talk about on comp.arch, to get us out of the vfork wars, I thought that I might talk in an informal manner about some of the work that I am doing with my advisor, Wen-Mei Hwu. 2. Summary - ------- The burst of bus transactions on transfer of a spinlock can be eliminated by supporting the ability to abandon a pending bus request, when bus traffic is observed that makes it unnecessary. This gives performance comparable to that of the Graunke and Thakkar xmem queuing lock, in a manner that is more independent of the instruction set architecture, and permits more flexible software implementations. 3. Background - ---------- comp.arch readers should not need to have the importance of synchronization operations explained to them. The excess bus traffic that can be produced by a simple test-and-set spinlock is well-known. Also well known is the test-and- test-and-set principle, introduced by Rudolph and Segall, of spinning on a cached copy and only going to the bus with an interlocked atomic operation when the lock is observed to be free. Anderson et al [SIGMETRICS 89] point out that test- and-test-and-set spinlocks produce a burst of bus traffic when a lock is released that is being waited for by several processors. Several authors propose a mixture of hardware and software solutions for this burst. Probably the best is the Graunke and Thakkar xmem queue lock, described in IEEE COMPUTER magazine. Although it eliminates the burst, the Graunke queue lock has several problems which can be remedied by the bus abandonment lock described here. The rest of this post describes how we repeated the measure- ments described in Graunke and Thakkar, with a software simulator that allows us to change bus parameters more easily than they could, including in the comparison a new style of lock, the bus abandonment lock, that has the same performance as the Graunke queue lock but which does not have its problems. 4. Graunke Queue Lock - ------- ----- ---- Read the COMPUTER article for a description of the Graunke queue lock. Although a wonderful software only solution, the Graunke queue lock has the following problems: It requires an atomic exchange-with-memory operation, which not all processors have. This, it is instruction set dependent. It is vulnerable to preemption. It is difficult to imple- ment lock query operations, or to decide that you do not want to wait for a lock after you have been put on the queue. 5. The Bus Abandonment Lock - --- --- ----------- ---- 5.1. Analysis - - -------- Scenario analysis for the burst of bus transactions on a lock transfer shows that the problem is the race from the test- of the lock in cache, to the atomic test-and-set per- formed on the bus. Many processors pass the test- at the same time, and proceed to issue test-and-set operations. All of the test-and-set operations proceed one-by-one to the bus, causing the burst. If the pending bus test-and-set bus requests can be aban- doned when the first of them goes through, the burst is eliminated. (Yes, this is just Ultracomputer style combining, although we avoid the term combining because it has become a bit of a dirty word among systems engineers). Providing bus abandonment gives a spinlock much like the old, familiar, spinlock, without the Graunke queue lock's problems (aforementioned and otherwise). 5.2. Implementation - - -------------- On first realizing that only this single feature, abandoning pending bus requests, was the key to eliminating this burst of bus transactions that so many people were going to such lengths to avoid, we imagined that implementing bus abandon- ment must be hard. Actually, many existing bus protocols, ranging from the Amiga Zorro III through the VMEbus and up to FUTUREbus+, support abandoning bus requests in some way or other. Moreover, several processors already abandon bus requests like read or write when they have been made unnecessary (eg. they may snarf the read data off the bus, or cancel writes). Abandonment has just not been applied to synchronization, though. 5.3. Simple Experiments - - ------ ----------- We built a simple simulator, "synchsim", to evaluate these synchronization algorithms. We used the same benchmark, a simple multiprocessor increment a counter in a critical sec- tion, as Graunke and Thakkar. Our simulator lets us change bus scheduling algorithms and timing, something that is dif- ficult to do on real hardware. Like Graunke and Thakkar, a simple test-and-set spinloop that goes to the bus on every iteration is the worst syn- chronization algorithm in terms of bus traffic. However, we found the total number of bus transactions required to per- form the benchmark to be linear in the number of processors, not squared, as they mention off-hand. We describe this in more detail below. A test-and-test-and-set spinloop is also linear in the number of processors, due to the burst. We found Graunke and Thakkar's queue lock to require a con- stant amount of bus traffic, for multiple processors (actu- ally, we found a small linear factor, but it is practically negligible). By itself, a test-and-test-and-set style spinlock using bus abandonment also requires a constant amount of bus traffic, but slightly more than Graunke's queue algorithm, However, if you add the ability to check if a lock is present in the cache first, and if you broadcast the writes of the lock, both on lock release and when the lock is acquired, the bus abandonment lock is better than Graunke's queue lock. Finally, if your bus supports a special test-and-set opera- tion, like Encore's Nanobus, bus abandonment provides a lock that is considerably better than Graunke's. (As a check, the special synchronization operation without abandonment produces linear bus traffic). The quantitative results of these experiments are available, but we didn't include them here because graphs don't look good in ascii. Thus, the performance of the bus abandonment lock is con- stant in the number of processors, and is basically identi- cal to Graunke's queue lock. However, because we still have the classical test-and-set spinlock, the bus abandonment lock does not suffer the software problems of Graunke's lock. Moreover, the bus abandonment lock technique can be applied to any existing interlocked atomic operation; it does not require an exchange-with-memory operation operation in the instruction set, although bus abandonment can be applied to xmem if desired. 5.3.1. Worst Case - - - ----- ---- As mentioned above, there appears to be a bit of confusion as to the nature of the the worst case burst of bus transac- tions on a lock transfer. It is not that test-and-set is O(n^2), and test-and-test-and-set is O(n). Indeed, test- and-set cannot really be said to have a "burst", since its bus traffic is continuous. Actually, it is the type of bus scheduling that changes the order of the burst. we demonstrate this with the same benchmark as before, except with a delay added so that the burst will quiet down, using a test-and-test-and-set FIFO scheduling provides a burst that is linear in the number of processors, as does pure FIXED priority scheduling. How- ever, FIXED priority bus scheduling, with variations such as broadcasting the release of the lock (snarfing in the first test-) can produce O(n^2) bus traffic in the burst that is much worse. The same thing can happen with variations of other bus scheduling policies. Moral: when considering a bus scheduling policy, consider the effect on test-and-test-and-set spinloop bus traffic (or ensure that TTS isn't used in your system). 5.4. Further Work - - ------- ---- The conventional way of integrating interlocked atomic operations with the cache is to have the first read make the data exclusive. This is what Sequent does. Using bus aban- donment means that you can use a shared lock, by initiating a bus write to lock the data, abandoning the write if some other processor beats you to it. 6. Conclusion - ---------- If you are implementing spinlocks on an existing machine that supports an exchange-with-memory operation consider the Graunke queue lock. If you are implementing a new machine, consider either sup- porting xmem, so the Graunke queue lock can be implemented, or consider supporting the ability to abandon pending bus requests. If conditional locks or multiple lock spinning is important to you, seriously consider the bus abandonment lock. For OS applications (or systems where the user can control preemption): if preemption while waiting for the lock, or in the critical section, is a potential problem, then the bus abandonment lock is preferable to the Graunke queue lock. -- Andy Glew, andy-glew@uiuc.edu Propaganda: UIUC runs the "ph" nameserver in conjunction with email. You can reach me at many reasonable combinations of my name and nicknames, including: andrew-forsyth-glew@uiuc.edu andy-glew@uiuc.edu sticky-glue@uiuc.edu and a few others. "ph" is a very nice thing which more USEnet sites should use. There is an info-ph mailing list, or, contact Steve-Dorner@uiuc.edu. UIUC has ph wired into email and whois.