hamilton@siberia.rtp.dg.com (Eric Hamilton) (05/03/91)
In article <1991Apr22.175410.9840@decvax.dec.com>, kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) writes: |> |> The 88K has xmem, not test and set, which is guaranteed to be an atomic |> operation (the bus is locked between the read and write cycles). This |> can be used to create locks or compare and swap without disabling itnerrupts. |> Now I'm curious. How does one fabricate compare-and-swap out of xmem without disabling interrupts? Xmem has essentially the same restriction as test-and-set, which is that the stored value must be a constant independent of the loaded value. -- ---------------------------------------------------------------------- Eric Hamilton +1 919 248 6172 Data General Corporation hamilton@dg-rtp.rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!hamilton Research Triangle Park, NC 27709, USA
johnl@iecc.cambridge.ma.us (John R. Levine) (05/04/91)
In article <1991May2.201917.15062@dg-rtp.dg.com> hamilton@siberia.rtp.dg.com (Eric Hamilton) writes: >Now I'm curious. How does one fabricate compare-and-swap out of xmem >without disabling interrupts? You can't, at least not without a scheme that uses spin locks which are at least as bad as disabling interrupts. See Herlihy's article on "Wait-free Synchronization" in the January TOPLAS. It turns out that CAS is fundamentally more powerful than most other synchronization primitives. -- John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 492 3869 johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl Cheap oil is an oxymoron.
kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (05/06/91)
In article <1991May2.201917.15062@dg-rtp.dg.com>, hamilton@siberia.rtp.dg.com (Eric Hamilton) writes: |> In article <1991Apr22.175410.9840@decvax.dec.com>, kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) writes: |> |> |> |> The 88K has xmem, not test and set, which is guaranteed to be an atomic |> |> operation (the bus is locked between the read and write cycles). This |> |> can be used to create locks or compare and swap without disabling itnerrupts. |> |> |> Now I'm curious. How does one fabricate compare-and-swap out of xmem |> without disabling interrupts? Xmem has essentially the same restriction as |> test-and-set, which is that the stored value must be a constant independent of |> the loaded value. |> Once you've created a lock (with xmem) you can do almost anything within the locked region of code without disabling interrupts. ----------------------------------------------------------------------------- == jeff kenton Consulting at kenton@decvax.dec.com == == (617) 894-4508 (603) 881-0011 == -----------------------------------------------------------------------------
jthomas@nmsu.edu (James Thomas) (05/13/91)
In article <1991May6.113149.14531@decvax.dec.com> kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) writes:
jk> Once you've created a lock (with xmem) you can do almost anything
jk> within the locked region of code without disabling interrupts.
Once I've created a lock with Dekker's algorithm, I can do anything :-)
Jim
glew@pdx007.intel.com (Andy Glew) (05/13/91)
>Once you've created a lock (with xmem) you can do almost anything within the >locked region of code without disabling interrupts. Including being preempted, leaving all the other processors spinning for a lock held by a process which is swapped out. Gang scheduling is only half a solution. In my synchronization survey I call this the "spinning on a non-running processor" problem. So far, only the i860 has a general solution to this problem, with its operation that blocks interrupts for a short length of time from user mode. -- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
hamilton@siberia.rtp.dg.com (Eric Hamilton) (05/13/91)
In article <1991May6.113149.14531@decvax.dec.com>, kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) writes: |> In article <1991May2.201917.15062@dg-rtp.dg.com>, |> hamilton@siberia.rtp.dg.com (Eric Hamilton) writes: |> |> In article <1991Apr22.175410.9840@decvax.dec.com>, |> kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) writes: |> |> |> |> |> |> The 88K has xmem, not test and set, which is guaranteed to be an atomic |> |> |> operation (the bus is locked between the read and write cycles). This |> |> |> can be used to create locks or compare and swap without disabling |> itnerrupts. |> |> |> |> |> Now I'm curious. How does one fabricate compare-and-swap out of xmem |> |> without disabling interrupts? Xmem has essentially the same restriction as |> |> test-and-set, which is that the stored value must be a constant |> independent of |> |> the loaded value. |> |> |> |> Once you've created a lock (with xmem) you can do almost anything within the |> locked region of code without disabling interrupts. |> True, but implementing compare-and-swap is not one of those things. Two reasons, one of which is has nothing to do with interrupt disable. 1) Compare-and-swap will be atomic only with respect to other compare-and-swap operations (or other operations which honor the lock). It will not be atomic with respect to stores, nor to any non-CPU accesses, such as I/O references, ref/mod bit updates in page table entries, and the like. Note that xmem doesn't suffer from this defect. Of course, we can't remove this restriction even if we had interrupt disable. So, for present purposes, let's restrict compare-and-swap so that it is only atomic with respect to other compare-and-swap operations. Of course, this is a big restriction, and by itself may be enough to dispose of the misbegotten notion that test-and-set, exchange-memory, or the like are sufficient in an architecture. 2) Compare-and-swap is non-blocking, in the sense that it is guaranteed to complete one way or another, store or no store, succeed or fail, in a bounded time (to be useful the bound must be fairly low - a few tens of cycles is nice). The "you can do anything within the locked region" approach doesn't have this non-blocking property. Suppose one process holds the lock because it is in the middle of a compare-and-swap, and a second process tries to obtain the lock. The second process will block until the first process releases the lock. But there is no assurance that the first process will ever release the lock. It may take a time slice interrupt, be descheduled, swapped out, and languish there indefinitely. It may be killed. It may take a signal, and in its signal handler try to do another compare-and-swap, need another lock, and deadlock. Meanwhile, the second process will wait forever. The only way to guarantee the non-blocking behavior of compare-and-swap is to guarantee that whenever a process begins a compare-and-swap, it will expeditiously complete it and release the lock. And the only way to make this guarantee is to.... DISABLE INTERRUPTS WHILE THE LOCK IS HELD Note that xmem doesn't have this defect either. Xmem is non-blocking and atomic with respect to interrupts - there's no such thing as an interrupt in the middle of an xmem. The moral of this is roughly as follows: Atomic operations, such as compare-and-swap, that update memory with a value which is a function of the old value are generally useful. They allow you to do things that cannot be done with operations such as xmem or test-and-set, which update memory with a value that cannot be a function of the previous value. Compare-and-swap cannot be fabricated out of xmem or test-and-set without disabling interrupts or severely constraining its use. This is unsatisfactory for user code and many places inside the operating system. Therefore, cost-benefit tradeoffs permitting, one should make an effort to include compare-and-swap or something similar in the architecture. Omitting it is a defect. -- ---------------------------------------------------------------------- Eric Hamilton +1 919 248 6172 Data General Corporation hamilton@dg-rtp.rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!hamilton Research Triangle Park, NC 27709, USA
johnl@iecc.cambridge.ma.us (John R. Levine) (05/14/91)
In article <1991May13.140634.17669@dg-rtp.dg.com> hamilton@siberia.rtp.dg.com (Eric Hamilton) writes: >Atomic operations, such as compare-and-swap, that update memory with a value >which is a function of the old value are generally useful. They allow you >to do things that cannot be done with operations such as xmem or test-and- >set, which update memory with a value that cannot be a function of the >previous value. > >Compare-and-swap cannot be fabricated out of xmem or test-and-set without >disabling interrupts or severely constraining its use. This is >unsatisfactory for user code and many places inside the operating system. This is entirely true. Everyone who has any interest in locking primitives should immediately get a copy of the January 1991 ACM TOPLAS and read the article by Maurice Herlihy entitled "Wait-free synchronization." He shows that for a realistic model problem (getting N processors to agree on the value of a shared variable) CAS can synchronize arbitrarily many processes or processors without blocking, while test-and-set or xmem can only do two. Most other locking primitives are less powerful than CAS. The only ones that are as powerful are at least as complex, e.g. atomic queue operations with peeking or atomic move from one shared memory location to another. Many people seem to believe that one can't do useful synchronization without disabled interrupts, spin locks, or some other scheme that makes one processor hang around and wait for another one to get out of its way. They are wrong. -- John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 492 3869 johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl Cheap oil is an oxymoron.
kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (05/14/91)
In article <GLEW.91May13102912@pdx007.intel.com>, glew@pdx007.intel.com (Andy Glew) writes: |> |> >Once you've created a lock (with xmem) you can do almost anything within the |> >locked region of code without disabling interrupts. |> |> Including being preempted, leaving all the other processors spinning |> for a lock held by a process which is swapped out. |> Several postings have convinced me on this issue. |> |> In my synchronization survey I call this the "spinning on a |> non-running processor" problem. So far, only the i860 has a general |> solution to this problem, with its operation that blocks interrupts |> for a short length of time from user mode. |> BBN's latest Butterfly was designed with this feature, although I don't think it got used. (Yes, I know they called it the TC-2000, not the Butterfly 2. What do they know.) Locking was done with xmem instead. ----------------------------------------------------------------------------- == jeff kenton Consulting at kenton@decvax.dec.com == == (617) 894-4508 (603) 881-0011 == -----------------------------------------------------------------------------
terry@venus.sunquest.com (Terry R. Friedrichsen) (05/14/91)
While we're on this subject, can somebody give me a rundown on how locking primitives cam be implemented on the MIPS RISC chips? I know that load-linked (LL) and store-conditionally (SC) are the answer for the R6000, but what about the R2000 and R3000? AdTHANKSvance. Terry R. Friedrichsen terry@venus.sunquest.com (Internet) uunet!sunquest!terry (Usenet) terry@sds.sdsc.edu (alternate address; I live in Tucson) Quote: "Do, or do not. There is no 'try'." - Yoda, The Empire Strikes Back
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (05/14/91)
In article <1991May14.033023.26083@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: | Many people seem to believe that one can't do useful synchronization without | disabled interrupts, spin locks, or some other scheme that makes one | processor hang around and wait for another one to get out of its way. They | are wrong. Spin locking is generally less wasteful of CPU than a complete context switch to a process which is not blocked, and back again. There are only two types of lock which seem to make sense to me: for short action a spin lock, and for long stuff something like pseudo-io which blocks the waiting process and enables it back to the dispatcher when the inlock occurs. Short is anything up to 4x a context switch, since a spin lock will average half the worst case time and a block will always cost 2x context switch. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
bob@tera.com (Bob Alverson) (05/14/91)
In article <1991May14.033023.26083@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: >In article <1991May13.140634.17669@dg-rtp.dg.com> hamilton@siberia.rtp.dg.com (Eric Hamilton) writes: >>Atomic operations, such as compare-and-swap, that update memory with a value >>which is a function of the old value are generally useful. They allow you >>to do things that cannot be done with operations such as xmem or test-and- >>set, which update memory with a value that cannot be a function of the >>previous value. >> >>Compare-and-swap cannot be fabricated out of xmem or test-and-set without >>disabling interrupts or severely constraining its use. This is >>unsatisfactory for user code and many places inside the operating system. > > ... > >Most other locking primitives are less powerful than CAS. The only ones that >are as powerful are at least as complex, e.g. atomic queue operations with >peeking or atomic move from one shared memory location to another. > Unfortunately, CAS is much more costly in hardware than a plain swap or a fetch and op. A load uses an address and returns a word. A store uses an address and a word. Swap is the union of load and store, taking an address and a data word and returning a data word. Fetch and op is a little more, making the data stored a function of the data read. CAS is still more, needing an extra data word which may blaze new word wide datapaths through the memory access logic. I also take issue with the idea of n threads reaching consensus by acting on a single memory location. Using CAS the software is wait free, but the hardware must serialize all the accesses. I suppose you could do a combining network (NYU style), but can't the combining be implemented in software somehow? Or is that fundamentally impossible to do without waiting? Bob
moss@cs.umass.edu (Eliot Moss) (05/14/91)
I posted something liek this recently, but would like to repeat it. I have been working with Maurice Herlihy for a little while now on problems such as lock-free garbage collection (that work will be presented at the next SPAA and will appear in an IEEE journal next year). I have a firm belief (but no proof yet) that CAS2, which allows atomic compare and swap of TWO, ARBITRARY memory locations is strictly more powerful than CAS, in the sense that it allows implementation of EFFICIENT lock-free (and wait-free) algorithms that CAS is not strong enought to implement efficiently. In particular, to update many elements of a shared structure atomically with CAS appears to require building a copy of the ENTIRE structure and then using CAS to swing a pointer to the new version. CAS2 supports a reasonable, atomic, update-in-place of any number of elements of a shared data structure. I hope to have firm results and proofs before too long. Meanwhile, it is something hardware designers may wish to start thinking about .... Eliot Moss -- J. Eliot B. Moss, Assistant Professor Department of Computer and Information Science Lederle Graduate Research Center University of Massachusetts Amherst, MA 01003 (413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu
zalman@mips.com (Zalman Stern) (05/15/91)
In article <19305@sunquest.UUCP> terry@venus.sunquest.com (Terry R. Friedrichsen) writes:
[How do you do atomic locking using the MIPS-I instruction set (i.e. R[23]000).]
Either operating system support or extra hardware. Here are a few
design points:
MIPS RISC/os (and Ultrix and others) provide a system call which masks
interrupts and does a test-and-set like operation. (You can also use
SysV semaphores of course, but there are many problems there.)
Mach traps an illegal instruction to do the operation. This is faster than
a system call. They also claim to have a completely user space
algorithm that works even with preemption. I don't know the details.
SGI provides hardware on their multi-processor machines. This involves
special memory (a section of) which is mapped into a processes address
space via a system call. A load from the memory location tries to
acquire the lock and returns success/or failure as the loaded value. A
store clears the lock.
DEC SRC is doing work where the OS and compiler cooperate such that the OS
can tell when a user program was trying to acquire a lock. If
preemption occurs during a lock operation, the OS allows it to complete
(or completes the operation itself, I'm not sure). (This is for their
Topaz OS not UNIX.)
--
Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94088
zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman (408) 524 8395
"So you see, what we can do is try something new / If you're crazy too
I don't really see why can't we go on as three" Triad By David Crosby
glew@pdx007.intel.com (Andy Glew) (05/17/91)
>|> In my synchronization survey I call this the "spinning on a >|> non-running processor" problem. So far, only the i860 has a general >|> solution to this problem, with its operation that blocks interrupts >|> for a short length of time from user mode. >|> > >BBN's latest Butterfly was designed with this feature, although I don't >think it got used. (Yes, I know they called it the TC-2000, not the >Butterfly 2. What do they know.) Locking was done with xmem instead. I have the TC2000 manuals, and I believe that I looked closely for any feature that avoids the "spinning on a non-running processor" problem. There does not seem to be any hardware support for this, although if I have missed it I would be happy to be corrected. If there is support for the "spinning on a non-running processor" problem in the BBN TC2000, I conjecture that it is purely in the OS in the form of something like gang scheduling. The TC2000 does have the ability to block interrupts, but not from user mode. Again, correct me if I am wrong. The TC2000 UNIX does provide a variety of atomic synchronization primitives, like compare-and-swap, fetch-and-add, fetch-and-or, etc. These are implemented as system calls. The system call blocks interrupts, XMEM is used for the hardware lock that excludes other processors, and the operation is performed. I think that what we are all interested in is a way of doing these forms of synchronization without system calls. Spinlocks on the TC2000 must be test-and-set spinloops. You can't even use a cache test to do a test-and-test-and-set spinloop. The TC2000 manual goes into great detail about adding delays to a test-and-set spinloop to reduce contention. By the way - MIPS' load-locked/store-conditional synchronization primitives might conceivably be used to block interrupts as well as providing atomic interprocessor RMWs. Question for MIPS: are interrupts blocked inside a load-locked/store-conditional sequence? Personal preference: it's better to separate the interrupt blocking and the atomic RMW functionality. [*] thanks to someone at BBN - if you want the synchronization features of your machine to be discussed in the December, 1991, edition of the synchronization survey, tell me about it now. "Telling me about it" can comprise email about the synchronization features, although spec sheets, users manuals, etc. are always preferred (mainly because half the things people have told me by email have been wrong). Send to: My email address is glew@ichips.intel.com My office phone is 503-696-4119 Office FAX: 503-648-7397 My mail address is Andy Glew Mail stop JF1-19 Intel Corporation 5200 NE Elam Young Parkway Hillsboro, OR 97124-6497 -- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
bret@orac.UUCP (Bret Indrelee) (05/19/91)
>kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) writes: >Once you've created a lock (with xmem) you can do almost anything within the >locked region of code without disabling interrupts. > So let me see if I have this right? Instead of a Compare-And-Swap, you would compete for a lock bit, do your atomic instruction, and then release the lock bit? I would much rather have the CAS in hardware. With CAS, you can emulate the other common atomic accesses (Test and Set, Fetch and Add, parallel linked list manipulation, and the list goes on) without need for a special lock bit on the data. The importance of atomic accesses is even more important when you start putting more than one processor into a single system. I kind of like being able to manipulate linked lists without having to lock the whole list. -Bret -- ------------------------------------------------------------------------------ Bret Indrelee | <This space left intentionally blink> bret@orac.edgar.mn.org | ;^)
lkaplan@star-trek.bbn.com (Larry Kaplan) (05/28/91)
In article <GLEW.91May17100952@pdx007.intel.com> glew@pdx007.intel.com (Andy Glew) writes: >>|> In my synchronization survey I call this the "spinning on a >>|> non-running processor" problem. So far, only the i860 has a general >>|> solution to this problem, with its operation that blocks interrupts >>|> for a short length of time from user mode. >>|> >> >>BBN's latest Butterfly was designed with this feature, although I don't >>think it got used. (Yes, I know they called it the TC-2000, not the >>Butterfly 2. What do they know.) Locking was done with xmem instead. > >The TC2000 does have the ability to block interrupts, but not from >user mode. Again, correct me if I am wrong. At one time it was thought that the TC2000 could give the user the option of blocking out interrupts via something called a user augmentations register available in user space. This turned out to not work well enough for O/S related reasons. Instead, the way we currently deal with the problem is to not worry about the interrupts themselves. They are not going to cause user mode locks any problems since no interrupt code is ever going to take a user lock. The real problem, as stated before, is getting context switched as a result of an interrupt (e.g. a scheduler interrupt). To solve this, we provide various system calls for the user to disable preemption. Look at user_nopreempt(2). This allows XMEM based locks to avoid the "spinning on a non-running processor" problem. >The TC2000 UNIX does provide a variety of atomic synchronization >primitives, like compare-and-swap, fetch-and-add, fetch-and-or, etc. >These are implemented as system calls. The system call blocks >interrupts, XMEM is used for the hardware lock that excludes other >processors, and the operation is performed. This is not exactly correct. System calls are used for these atomic operations, but they are of the "fast trap" kind where no context is saved and no interrupts or other exceptions are handled during the call. Each operation is fairly short so this works well. Also, XMEMs are NOT used to implement any of these operations. The TC2000 hardware has true locking semantics on its memories and interconnection network. This works by locking a memory when a locked operation starts, and keeping it locked until the entire operation is complete. These are activated automatically by a user mode XMEM and manually in the kernel to do the other atomic operations. The general sequence for a fetch and op is: Do preparatory work like checking that the memory is resident Disable interrupts Set lock bit (lock not taken until next real memory reference) Read location (sets lock) Operate on data in registers Write result Clear locks Return original value More details of the locking protocol are available in "Inside the TC2000". > I think that what we are all interested in is a way of doing these >forms of synchronization without system calls. True enough. #include <std_disclaimer> _______________________________________________________________________________ ____ \ / ____ Laurence S. Kaplan | \ 0 / | BBN Advanced Computers lkaplan@bbn.com \____|||____/ 70 Fawcett St. (617) 873-2431 /__/ | \__\ Cambridge, MA 02238