[comp.arch] Living With Old Baggage

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (04/22/91)

In article <1991Apr12.001324.5@ux1.cso.uiuc.edu>
	msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) writes:
>The problem is that you will have to include those instructions in the
>next version of your processor in order to remain binary compatible,
>and if things work out differently, as they probably will, they may be
>hard to implement.

In article <336@scorpio.gtephx.UUCP> robertsw@scorpio.UUCP
	(Wild Rider) gives the 68020 "callm"/"rtm" as examples.

An example they weren't able to dump was "cas2". This fetches two
independant words, and then conditionally overwrites them, with one
big R/M/W lock on memory during the whole 4 memory cycles.

Since each of these words could be unaligned, there could be four
page faults just in fetching them. (Let's not even talk about
write-protected/copy-on-write pages.) The 68040 team didn't want to
leave memory locked for a potentially long period, and yet releasing
the lock logically implies a full restart. So, before attempting the
reads, the 68040 insures that all 1-4 mappings are in the TLB and
valid. That's not as trivial as it may sound: the design has to allow
for (say) the 4th mapping bumping the 3rd one back out of the TLB.

The chip works, but I believe they regret getting themselves into
this.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Robotics Institute

jcallen@Encore.COM (Jerry Callen) (04/22/91)

In article <12741@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:

>[regarding 68020 instructions that Moto had to carry over to the 68040]
>
>An example they weren't able to dump was "cas2". This fetches two
>independant words, and then conditionally overwrites them, with one
>big R/M/W lock on memory during the whole 4 memory cycles.
>
>Since each of these words could be unaligned, there could be four
>page faults just in fetching them.

cas2 is a horrible case of overkill, but a "reasonable" compare and swap
instruction would be AWFULLY handy. By "reasonable" I mean an instruction
that:

- fetches an aligned word from memory
- compares it to a register
- if the values match, the value in another register is stored back
- a condition code is set somewhere that indicates whether or not the
  store was done.

The ancient IBM 370 offered essentially this instruction; the OS used it
extensively, and I used it to implement locks at user level. It was 
sure handy!

Yes, this is VERY complex by RISC standards, but it makes a number of
nasty synchronization problems almost trivially easy. You can sorta-kinda
emulate compare and swap using test and set (which the 88K has) and spin
locks, but to do so in complete safety you have to disable interrupts
(nasty from user code...).

What makes a "reasonable" compare and swap so hard to implement that
current RISCs don't do it? It seems to me that implementing test and
set already forces MOST of the hard work to be done anyway.

-- Jerry Callen
   jcallen@encore.com

kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (04/23/91)

In article <14628@encore.Encore.COM>, jcallen@Encore.COM (Jerry Callen) writes:

|> 
|> Yes, this is VERY complex by RISC standards, but it makes a number of
|> nasty synchronization problems almost trivially easy. You can sorta-kinda
|> emulate compare and swap using test and set (which the 88K has) and spin
|> locks, but to do so in complete safety you have to disable interrupts
|> (nasty from user code...).
|> 
|> -- Jerry Callen
|>    jcallen@encore.com

The 88K has xmem, not test and set, which is guaranteed to be an atomic
operation (the bus is locked between the read and write cycles).  This
can be used to create locks or compare and swap without disabling itnerrupts.

Go look at the locking primitives in Encore's 88K kernel.  They are built
around xmem's. (I was there.)


-----------------------------------------------------------------------------
==	jeff kenton		Consulting at kenton@decvax.dec.com        ==
==	(617) 894-4508			(603) 881-0011			   ==
-----------------------------------------------------------------------------

peter@nucleus.amd.com (Peter Song) (04/24/91)

In article <14628@encore.Encore.COM> jcallen@encore.Com (Jerry Callen) writes:
| cas2 is a horrible case of overkill, but a "reasonable" compare and swap
| instruction would be AWFULLY handy. By "reasonable" I mean an instruction
| that:
| 
| - fetches an aligned word from memory
| - compares it to a register
| - if the values match, the value in another register is stored back
| - a condition code is set somewhere that indicates whether or not the
|   store was done.
| 
| The ancient IBM 370 offered essentially this instruction; the OS used it
| extensively, and I used it to implement locks at user level. It was 
| sure handy!
| 
| Yes, this is VERY complex by RISC standards, but it makes a number of
| nasty synchronization problems almost trivially easy. You can sorta-kinda
| emulate compare and swap using test and set (which the 88K has) and spin
| locks, but to do so in complete safety you have to disable interrupts
| (nasty from user code...).
| 
| What makes a "reasonable" compare and swap so hard to implement that
| current RISCs don't do it? It seems to me that implementing test and
| set already forces MOST of the hard work to be done anyway.
| 
| -- Jerry Callen
|    jcallen@encore.com

An efficient compare-and-swap (CAS) function can be provided in RISC architectures 
without providing one instruction that does it.  I'll give an example using the
load linked (LL) and store conditionally (SC) instructions in the MIPS II architecture.

The LL reads from a memory location and sets a status bit.  The SC completes the write
only if the status bit is still set - the status bit gets cleared when a SC writes to
the location associated with the status bit.  The following code provides the
compare-and-swap(old,new,match) == compare-and-swap(T0,T1,T2) function:

	L:	LL	T4,(T0)		;T0 has the address of old
		BNE	T4,T2,SKIP	;if old != match, skip
		NOP
		SC	T1,(T0)		;write new atomically!!
					;SC is success only if T1 is 1 (I think)
		...
	SKIP:				;store in compare-and-swap didn't happen

If several processes simultaneously execute the compare-and-swap, only one process
that first executes the SC instruction will succeed.

On a side note, I think the only real benefit of the compare-and-swap over the more
simpler test-and-set is that it semantically defines when the write is to take place.
This definition reduces the number of writes in spinlock that require coherency actions
in cache-based systems.  This is not to say that the test-and-set can't be implemented
in the ways that eliminate the unnecessary writes.  For instance, the Am29K has the
LOADSET, a test-and-set type of instruction, that defines -1 to be the "set" value.
Since the "set" value is defined in hardware, the LOADSET can check the lock and not
start the write if the lock is already set.  This is possible only if the instruction
defines the "set" value.


-Peter
--
S. Peter Song - 29K Advanced Processor Development - Advanced Micro Devices
peter@nucleus.amd.com                                (800) 531-5202 x54818