reinhard@tristar.samsung.com (Steven Reinhardt) (06/06/91)
A while ago there was a thread on building atomic ops (specifically compare-and-swap) from lower-level primitives. Some examples were given for MIPS systems using LL and SC instructions (load-linked and store-conditionally, I believe). I'd never heard of these, but I assumed that they had been added post-R3000 for MP synchronization. Would someone be kind enough to post a description of their semantics? (I would look it up, but - I have no easy access to recent MIPS references - maybe others will find this interesting - I'm personally more interested in MP synch. than IEEE FP, so maybe we can get some discussion going there.) Thanks, Steve
cprice@mips.com (Charlie Price) (06/07/91)
In article <24990@samsung.samsung.com> reinhard@tristar.samsung.com (Steven Reinhardt) writes: >Some examples were given for MIPS systems using LL and SC >instructions (load-linked and store-conditionally, I believe). >I'd never heard of these, but I assumed that they had been >added post-R3000 for MP synchronization. Would someone be >kind enough to post a description of their semantics? Since mash is out of town... The semantics are pretty simple. Load Linked loads a word from memory into a register. A subsequent Store Conditional to that location will succeed (store the word and return a success indication) if the location has not been changed in the interim and will fail (memory unchanged, failure returned) otherwise. These are user-mode instructions for MIPS-2 architectures: the R6000 and the presumably-to-be-announced-as-a-product-someday-soon R4000. Details: 6 5 5 16 field size _____________________________________________ | LL | base-reg | tgt-reg | offset | --------------------------------------------- _____________________________________________ | SC | base-reg | src-reg | offset | --------------------------------------------- Both instructions add a 16-bit sign-extended offset to the value of a base register to formulate the virtual address. Load Linked loads the addressed word from memory into the target register. Store Conditional conditionally stores a word from the src-reg into memory. The address must be same as that loaded by the last LL. The store will succeed (modify memory and signal success) only if the location has not been modified since it was loaded by the LL. The store will fail (not modify memory, signal success) if the location has been modified since the LL. It will also fail if the processor has changed state to a less secure level (by executing a return-from-exception). If the location is forced from cache memory by a cache miss after the LL instruction, the SC may fail (LL-stuff-SC sequences should be careful not to cache miss). Success is indicated by the contents of src-reg after execution of the instruction: 1 for success and 0 for failure. For both LL and SC the hardware implements an implicit SYNC: Loads/stores issued prior to the LL or SC will complete before the LL or SC touches memory. Loads/stores issued after the LL or SC will access memory after the LL or SC touches memory. For both instructions, caching (and coherency if supported by the processor) must be enabled for the virtual address. Conceptually, the processor watches for changes to the physical address that was loaded by the LL and knows, for a subsequent SC, whether it has been modified (written to anyway). I think the Load Linked name came from the idea of keeping a "link" between the cached value and memory. Anything that would "sever" the link would cause a subsequent SC to fail. The link can be severed by seeing the location written to before the SC or by losing track of the address (the implementation almost certainly requires that it stay in the processor's cache). A LL in one context must be protected from an SC in another. It is sufficient to sever the link whenever the processor goes to a less-protected state (return-from-exception). A note on caching and coherence: For regular addresses on the MIPS processors, whether the a reference to the address is cached or not is determined on a per-virtual-page basis by a bit in the TLB entry. Your typical user page is cached. LL/SC only works on locations accessed though the cache. For processors with cache coherence (The R6000A used in the R6000-based multiprocessor that CDC has been talking about and the R4000) the same thing holds true -- the coherence applied to an access is a property of the virtual-to-physical translation. LL/SC only works on locations that are marked coherent (and therefore kept up to date with memory by hardware). -- Charlie Price cprice@mips.mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. MS 1-03 / Sunnyvale, CA 94088-3650
glew@pdx007.intel.com (Andy Glew) (06/10/91)
>A note on caching and coherence: For regular addresses on the MIPS >processors, whether the a reference to the address is cached or not is >determined on a per-virtual-page basis by a bit in the TLB entry. >Your typical user page is cached. LL/SC only works on locations >accessed though the cache. For processors with cache coherence (The >R6000A used in the R6000-based multiprocessor that CDC has been >talking about and the R4000) the same thing holds true -- the >coherence applied to an access is a property of the >virtual-to-physical translation. LL/SC only works on locations that >are marked coherent (and therefore kept up to date with memory by >hardware). > >-- >Charlie Price cprice@mips.mips.com (408) 720-1700 >MIPS Computer Systems / 928 Arques Ave. MS 1-03 / Sunnyvale, CA 94088-3650 Q: what does MIPS do for synchronization through uncached memory? E.g. synchronization with an I/O device? Or are all I/O devices cache consistent? -- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
peter@nucleus.amd.com (Peter Song) (06/11/91)
In article <GLEW.91Jun9111923@pdx007.intel.com> glew@pdx007.intel.com (Andy Glew) writes: | | >... LL/SC only works on locations that | >are marked coherent (and therefore kept up to date with memory by | >hardware). | > | >-- | >Charlie Price cprice@mips.mips.com (408) 720-1700 | | Q: what does MIPS do for synchronization through uncached memory? What are the circumstances where one has to use non-cacheable memory locations for any "meaningful" (meaningful being more than just the producer/consumer relationship) synchronization, given that LL/SC works only with cacheable locations? | E.g. synchronization with an I/O device? Or are all I/O devices cache consistent? Only the dma devices that access "cacheable portions of memory" in a write-back system must understand the data intervention protocol for the dma read (read from the memory). The dma write should be handled correctly by the caches (and let's not think about a dma write to a dirty block). The non-dma i/o devices can rely on the load and store instructions to maintain consistency in the memory space as the data is moved from and to the memory space. There is no use/need to maintain consistency between the memory space and the io space where data is produces and consumed. | | Andy Glew, glew@ichips.intel.com S. Peter Song - 29K Advanced Processor Development - Advanced Micro Devices peter@nucleus.amd.com (800) 531-5202 x54818
cprice@mips.com (Charlie Price) (06/11/91)
In article <GLEW.91Jun9111923@pdx007.intel.com> glew@pdx007.intel.com (Andy Glew) writes: > >>A note on caching and coherence: For regular addresses on the MIPS >>processors, whether the a reference to the address is cached or not is >>determined on a per-virtual-page basis by a bit in the TLB entry. >>Your typical user page is cached. LL/SC only works on locations >>accessed though the cache. For processors with cache coherence (The >>R6000A used in the R6000-based multiprocessor that CDC has been >>talking about and the R4000) the same thing holds true -- the >>coherence applied to an access is a property of the >>virtual-to-physical translation. LL/SC only works on locations that >>are marked coherent (and therefore kept up to date with memory by >>hardware). > >Q: what does MIPS do for synchronization through uncached memory? >E.g. synchronization with an I/O device? Or are all I/O devices >cache consistent? I'm not quite sure I understand the question. Do you mean synchronization in access to device registers? LL/SC exist to provide a means for mutual exclusion among "consenting adults". Cache coherence is the mechanism that the processor has available to tell it when a shared data item has been modified by another (coherent) access in the system. LL/SC are basically only useful for lock words among processors. If a smart I/O controller needed to participate in mutual exclusion, I guess you would have to make the answer up depending on what the I/O controller and/or it's memory were able to do. The devices for this system are probably on the VME bus, so any onboard memory they had wouldn't know about our coherency. I/O (DMA) is (normally) coherent. Cache coherence is done by two parts: 1) The cache controller (in this case, on chip like all MIPS processors) that issues requests (like "I want to read this cache line and own it") and can service external requests (like "invalidate this location"). 2) Some external mechanism to accept requests from the processors and service requests for coherent memory operations. In our case, this is a custom snoopy bus-interface chip. The memory and I/O adapter cards are hooked up to the system backplane with Bus Interface Chips, so I/O operations can be made cache coherent. I hope I managed to stumble across what you were actually asking -- if not, try again. -- Charlie Price cprice@mips.mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. MS 1-03 / Sunnyvale, CA 94088-3650
bret@orac.UUCP (Bret Indrelee) (06/13/91)
In article <1991Jun10.182119.5523@dvorak.amd.com> peter@nucleus.amd.com (Peter Song) writes: >In article <GLEW.91Jun9111923@pdx007.intel.com> glew@pdx007.intel.com (Andy Glew) writes: >| >| >... LL/SC only works on locations that >| >are marked coherent (and therefore kept up to date with memory by >| >hardware). >| > >| >-- >| >Charlie Price cprice@mips.mips.com (408) 720-1700 >| >| Q: what does MIPS do for synchronization through uncached memory? > >What are the circumstances where one has to use non-cacheable memory locations for any >"meaningful" (meaningful being more than just the producer/consumer relationship) >synchronization, given that LL/SC works only with cacheable locations? What if you had a non-cacheable, shared memory on the system bus. It is possible to make a shared memory that links multiple systems to a single coherent memory image. Not all the processors sharing the memory would be on a single chassis. If you had a bus that depended on snooping/snarfing to maintain the cache-coherence, you would have to mark this shared memory as non-cacheable. (It can be modified without a local bus access being generated, so there is nothing to snoop/snarf.) -Bret -- ------------------------------------------------------------------------------ Bret Indrelee | Our mail is still somewhat unreliable. Sorry. uunet.uu.net!cs.umn.edu!kksys!edgar!orac!bret -And still trying
peter@nucleus.amd.com (Peter Song) (06/15/91)
| >What are the circumstances where one has to use non-cacheable memory locations for any | >"meaningful" (meaningful being more than just the producer/consumer relationship) | >synchronization, given that LL/SC works only with cacheable locations? | | What if you had a non-cacheable, shared memory on the system bus. It is | possible to make a shared memory that links multiple systems to a single | coherent memory image. Not all the processors sharing the memory would | be on a single chassis. Let's not get confused with the mechanisms for synchronization and the mechanisms for cache coherency - two are not the same. All copies of a synchronization variable (ie. the location used to achieve some form of synchronization) must be kept consistent at all times, whereas all copies of a shared location not used for synchronization may be kept consistent "only when necessary." The less efficient it is to notify a change of shared data to all its copies, less often a system SHOULD try to maintain consistency. Unfortunately, most shared memory multiprocessors do not distinguish between synchronization variables and shared data, and apply the same consistency rules to both. | | If you had a bus that depended on snooping/snarfing to maintain the | cache-coherence, you would have to mark this shared memory as non-cacheable. | (It can be modified without a local bus access being generated, so there | is nothing to snoop/snarf.) Have you heard of "shared" bit (circa 1985 or earlier)? S. Peter Song - 29K Advanced Processor Development - Advanced Micro Devices peter@nucleus.amd.com (800) 531-5202 x54818