[comp.arch] MIPS LL & SC instrs

reinhard@tristar.samsung.com (Steven Reinhardt) (06/06/91)

A while ago there was a thread on building atomic ops
(specifically compare-and-swap) from lower-level primitives.
Some examples were given for MIPS systems using LL and SC
instructions (load-linked and store-conditionally, I believe).
I'd never heard of these, but I assumed that they had been
added post-R3000 for MP synchronization.  Would someone be
kind enough to post a description of their semantics?

(I would look it up, but
 - I have no easy access to recent MIPS references
 - maybe others will find this interesting
 - I'm personally more interested in MP synch. than IEEE FP, so
   maybe we can get some discussion going there.)

Thanks,

Steve

cprice@mips.com (Charlie Price) (06/07/91)

In article <24990@samsung.samsung.com> reinhard@tristar.samsung.com (Steven Reinhardt) writes:
>Some examples were given for MIPS systems using LL and SC
>instructions (load-linked and store-conditionally, I believe).
>I'd never heard of these, but I assumed that they had been
>added post-R3000 for MP synchronization.  Would someone be
>kind enough to post a description of their semantics?

Since mash is out of town...

The semantics are pretty simple.
Load Linked loads a word from memory into a register.
A subsequent Store Conditional to that location will
succeed (store the word and return a success indication)
if the location has not been changed in the interim
and will fail (memory unchanged, failure returned) otherwise.
These are user-mode instructions for MIPS-2 architectures:
the R6000 and
the presumably-to-be-announced-as-a-product-someday-soon R4000.

Details:

   6        5         5           16            field size
_____________________________________________
| LL  | base-reg | tgt-reg |     offset     |
---------------------------------------------
_____________________________________________
| SC  | base-reg | src-reg |     offset     |
---------------------------------------------

Both instructions add a 16-bit sign-extended offset
to the value of a base register to formulate the virtual address.

Load Linked loads the addressed word from memory into the target register.

Store Conditional conditionally stores a word from the
src-reg into memory.
The address must be same as that loaded by the last LL.
The store will succeed (modify memory and signal success)
only if the location has not been modified since it was
loaded by the LL.
The store will fail (not modify memory, signal success)
if the location has been modified since the LL.
It will also fail if the processor has changed state
to a less secure level (by executing a return-from-exception).
If the location is forced from cache memory by a cache miss
after the LL instruction, the SC may fail
(LL-stuff-SC sequences should be careful not to cache miss).
Success is indicated by the contents of src-reg after execution
of the instruction:  1 for success and 0 for failure.

For both LL and SC the hardware implements an implicit SYNC:
Loads/stores issued prior to the LL or SC will complete before the
LL or SC touches memory.
Loads/stores issued after the LL or SC will access memory after the
LL or SC touches memory.

For both instructions,
caching (and coherency if supported by the processor)
must be enabled for the virtual address.

Conceptually, the processor watches for changes to the physical
address that was loaded by the LL and knows, for a subsequent SC,
whether it has been modified (written to anyway).
I think the Load Linked name came from the idea of keeping
a "link" between the cached value and memory.
Anything that would "sever" the link would cause a
subsequent SC to fail.
The link can be severed by seeing the location written to
before the SC or by losing track of the address (the
implementation almost certainly requires that it stay
in the processor's cache).
A LL in one context must be protected from an SC in another.
It is sufficient to sever the link whenever the processor
goes to a less-protected state (return-from-exception).

A note on caching and coherence:
For regular addresses on the MIPS processors,
whether the a reference to the address is cached or not
is determined on a per-virtual-page basis by a bit in the
TLB entry.  Your typical user page is cached.
LL/SC only works on locations accessed though the cache.
For processors with cache coherence
(The R6000A used in the R6000-based multiprocessor that CDC
has been talking about and the R4000)
the same thing holds true -- the coherence applied to an access
is a property of the virtual-to-physical translation.
LL/SC only works on locations that are marked coherent
(and therefore kept up to date with memory by hardware).

-- 
Charlie Price    cprice@mips.mips.com        (408) 720-1700
MIPS Computer Systems / 928 Arques Ave.  MS 1-03 / Sunnyvale, CA   94088-3650

glew@pdx007.intel.com (Andy Glew) (06/10/91)

>A note on caching and coherence: For regular addresses on the MIPS
>processors, whether the a reference to the address is cached or not is
>determined on a per-virtual-page basis by a bit in the TLB entry.
>Your typical user page is cached.  LL/SC only works on locations
>accessed though the cache.  For processors with cache coherence (The
>R6000A used in the R6000-based multiprocessor that CDC has been
>talking about and the R4000) the same thing holds true -- the
>coherence applied to an access is a property of the
>virtual-to-physical translation.  LL/SC only works on locations that
>are marked coherent (and therefore kept up to date with memory by
>hardware).
>
>-- 
>Charlie Price    cprice@mips.mips.com        (408) 720-1700
>MIPS Computer Systems / 928 Arques Ave.  MS 1-03 / Sunnyvale, CA   94088-3650

Q: what does MIPS do for synchronization through uncached memory?
E.g. synchronization with an I/O device?  Or are all I/O devices 
cache consistent?
--

Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

This is a private posting; it does not indicate opinions or positions
of Intel Corp.

peter@nucleus.amd.com (Peter Song) (06/11/91)

In article <GLEW.91Jun9111923@pdx007.intel.com> glew@pdx007.intel.com (Andy Glew) writes:
| 
| >... LL/SC only works on locations that
| >are marked coherent (and therefore kept up to date with memory by
| >hardware).
| >
| >-- 
| >Charlie Price    cprice@mips.mips.com        (408) 720-1700
| 
| Q: what does MIPS do for synchronization through uncached memory?

What are the circumstances where one has to use non-cacheable memory locations for any
"meaningful" (meaningful being more than just the producer/consumer relationship)
synchronization, given that LL/SC works only with cacheable locations?

| E.g. synchronization with an I/O device?  Or are all I/O devices cache consistent?

Only the dma devices that access "cacheable portions of memory" in a write-back system
must understand the data intervention protocol for the dma read (read from the memory).
The dma write should be handled correctly by the caches (and let's not think about a
dma write to a dirty block).
The non-dma i/o devices can rely on the load and store instructions to maintain
consistency in the memory space as the data is moved from and to the memory space.
There is no use/need to maintain consistency between the memory space and the io space
where data is produces and consumed.

| 
| Andy Glew, glew@ichips.intel.com

S. Peter Song - 29K Advanced Processor Development - Advanced Micro Devices
peter@nucleus.amd.com                                (800) 531-5202 x54818

cprice@mips.com (Charlie Price) (06/11/91)

In article <GLEW.91Jun9111923@pdx007.intel.com> glew@pdx007.intel.com (Andy Glew) writes:
>
>>A note on caching and coherence: For regular addresses on the MIPS
>>processors, whether the a reference to the address is cached or not is
>>determined on a per-virtual-page basis by a bit in the TLB entry.
>>Your typical user page is cached.  LL/SC only works on locations
>>accessed though the cache.  For processors with cache coherence (The
>>R6000A used in the R6000-based multiprocessor that CDC has been
>>talking about and the R4000) the same thing holds true -- the
>>coherence applied to an access is a property of the
>>virtual-to-physical translation.  LL/SC only works on locations that
>>are marked coherent (and therefore kept up to date with memory by
>>hardware).
>
>Q: what does MIPS do for synchronization through uncached memory?
>E.g. synchronization with an I/O device?  Or are all I/O devices 
>cache consistent?

I'm not quite sure I understand the question.
Do you mean synchronization in access to device registers?

LL/SC exist to provide a means for mutual exclusion among "consenting adults".
Cache coherence is the mechanism that the processor has available to
tell it when a shared data item has been modified by another (coherent)
access in the system.
LL/SC are basically only useful for lock words among processors.

If a smart I/O controller needed to participate in mutual exclusion,
I guess you would have to make the answer up depending on what the
I/O controller and/or it's memory were able to do.
The devices for this system are probably on the VME bus, so any
onboard memory they had wouldn't know about our coherency.

I/O (DMA) is (normally) coherent.
Cache coherence is done by two parts:
  1) The cache controller (in this case, on chip like all MIPS processors)
     that issues requests (like "I want to read this cache line and own it")
     and can service external requests (like "invalidate this location").
  2) Some external mechanism to accept requests from the processors
     and service requests for coherent memory operations.
     In our case, this is a custom snoopy bus-interface chip.

The memory and I/O adapter cards are hooked up to the system backplane
with Bus Interface Chips, so I/O operations can be made cache coherent.

I hope I managed to stumble across what you were actually
asking -- if not, try again.

-- 
Charlie Price    cprice@mips.mips.com        (408) 720-1700
MIPS Computer Systems / 928 Arques Ave.  MS 1-03 / Sunnyvale, CA   94088-3650

bret@orac.UUCP (Bret Indrelee) (06/13/91)

In article <1991Jun10.182119.5523@dvorak.amd.com> peter@nucleus.amd.com (Peter Song) writes:
>In article <GLEW.91Jun9111923@pdx007.intel.com> glew@pdx007.intel.com (Andy Glew) writes:
>| 
>| >... LL/SC only works on locations that
>| >are marked coherent (and therefore kept up to date with memory by
>| >hardware).
>| >
>| >-- 
>| >Charlie Price    cprice@mips.mips.com        (408) 720-1700
>| 
>| Q: what does MIPS do for synchronization through uncached memory?
>
>What are the circumstances where one has to use non-cacheable memory locations for any
>"meaningful" (meaningful being more than just the producer/consumer relationship)
>synchronization, given that LL/SC works only with cacheable locations?


What if you had a non-cacheable, shared memory on the system bus.  It is
possible to make a shared memory that links multiple systems to a single
coherent memory image.  Not all the processors sharing the memory would
be on a single chassis.

If you had a bus that depended on snooping/snarfing to maintain the
cache-coherence, you would have to mark this shared memory as non-cacheable.
(It can be modified without a local bus access being generated, so there
is nothing to snoop/snarf.)

-Bret
-- 
------------------------------------------------------------------------------
Bret Indrelee		|	Our mail is still somewhat unreliable.  Sorry.
uunet.uu.net!cs.umn.edu!kksys!edgar!orac!bret		-And still trying

peter@nucleus.amd.com (Peter Song) (06/15/91)

| >What are the circumstances where one has to use non-cacheable memory locations for any
| >"meaningful" (meaningful being more than just the producer/consumer relationship)
| >synchronization, given that LL/SC works only with cacheable locations?
| 
| What if you had a non-cacheable, shared memory on the system bus.  It is
| possible to make a shared memory that links multiple systems to a single
| coherent memory image.  Not all the processors sharing the memory would
| be on a single chassis.

Let's not get confused with the mechanisms for synchronization and the mechanisms for 
cache coherency - two are not the same.  All copies of a synchronization variable
(ie. the location used to achieve some form of synchronization) must be kept consistent
at all times, whereas all copies of a shared location not used for synchronization
may be kept consistent "only when necessary."  The less efficient it is to notify a
change of shared data to all its copies, less often a system SHOULD try to maintain
consistency.  Unfortunately, most shared memory multiprocessors do not distinguish between
synchronization variables and shared data, and apply the same consistency rules to both.

| 
| If you had a bus that depended on snooping/snarfing to maintain the
| cache-coherence, you would have to mark this shared memory as non-cacheable.
| (It can be modified without a local bus access being generated, so there
| is nothing to snoop/snarf.)

Have you heard of "shared" bit (circa 1985 or earlier)?

S. Peter Song - 29K Advanced Processor Development - Advanced Micro Devices
peter@nucleus.amd.com                                (800) 531-5202 x54818