[comp.unix.wizards] Information on SPARC assembly

rp@osc.COM (Rich Patterson) (06/13/89)

Hi,
	I need some help in finding information on coding atmoic bit
operations on a Sun-4 (SPARC).  I wasn't able to find a reference to a
"Test and Set" operation in the Assembly guide that comes with our Sun-4.
	Are there any other references on SPARC assembly, Sun published or
otherwise ?  Any help or code would be appreciated.  Please e-mail to the
address below.

Thanks,
Rich P.
rp@osc.com
pacbell!osc!rp

dg@lakart.UUCP (David Goodenough) (06/16/89)

rp@osc.COM (Rich Patterson) sez:
> Hi,
> 	I need some help in finding information on coding atmoic bit
> operations on a Sun-4 (SPARC).  I wasn't able to find a reference to a
> "Test and Set" operation in the Assembly guide that comes with our Sun-4.
> 	Are there any other references on SPARC assembly, Sun published or
> otherwise ?  Any help or code would be appreciated.  Please e-mail to the
> address below.

I have never understood the need for a test and set instruction, when
you can make do with adc (add with carry). Allow me to explain:

The point behind TAS is to allow a process to test if a flag is set or
clear, and set it no matter what the result. But why does the test have
to be in the same instruction? In fact all that is needed is the ability
to capture the state of a bit, setting it as you do the capture, and
test it later. If you think about the following:

	Bit clear (i.e. resource available)

	Task 1 grabs a copy of the bit and sets it, but does not test it.

	Bit is now set

	Task 1 gets swapped out, and Task 2 runs

	Task 2 grabs a copy of the bit, and sets it again.

	Task 2 tests thre copy it captured - finds it was set, and assumes
		the resource is not available.

	Task 2 gets swapped out, and Task 1 comes back

	Task 1 tests it's copy of the bit, finds it was clear, and proceeds
		to use the resource.

Note that Task 1 got interrupted between sampling the bit, and testing it,
_BUT_IT_DIDN'T_MAKE_ANY_DIFFERENCE_ - the system still worked.

So the bottom line is all you need is the ability to capture the state
of a bit, and set it no matter what, all in one atomic instruction.
Add with carry works just nicely to do this:

put the flag in memory, it is a whole byte, initialize it to 0x7f (i.e.
all bits set, except the MS bit is clear) - the flag is now clear (resource
is available). To get and set the bit do the following:

	set carry
	add with carry		flag, flag
	jump on carry clear	resource available

Now if you break this sequence anywhere, it is still secure. Note that it
assumes you can adc memory,memory - if you can't look for a rotate left
instruction, which does about the same thing.

To release the resource, simply move 0x7f to the flag byte after you've
finished with the resource - that is trivial.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

m5@lynx.uucp (Mike McNally) (06/20/89)

In article <577@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
>I have never understood the need for a test and set instruction, when
>you can make do with adc (add with carry). Allow me to explain:
>
>The point behind TAS is to allow a process to test if a flag is set or
>clear, and set it no matter what the result. But why does the test have
>to be in the same instruction? 

The example given by Mr. Goodenough in fact incorporates the changing of 
the state of the flag in one instruction (the add-with-carry).  It is
thus true that the sequence is unbreakable *at the OS level*: a normal
OS will not reschedule while a task is in the middle of an instruction,
because most CPU's won't allow interrupts in the middle of an instruction.
(Note that this is not necessarily the case.)  A real TAS instruction
often comes with the proviso that the bus cycles used to fetch and store
are not interruptable either.  This guarantee is necessary in a multi-
processor environment.

I think that the x86 (x>0) series locks the bus on all XCHG instructions.
The original chips required a LOCK prefix.  I don't know whether or not
the LOCK is honored with other read/write instructions.

-- 
Mike McNally                                    Lynx Real-Time Systems
uucp: {voder,athsys}!lynx!m5                    phone: 408 370 2233

            Where equal mind and contest equal, go.

rec@dg.dg.com (Robert Cousins) (06/20/89)

In article <5742@lynx.UUCP> m5@lynx.UUCP (Mike McNally) writes:
>In article <577@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
>>I have never understood the need for a test and set instruction, when
>>you can make do with adc (add with carry). Allow me to explain:
>>
>>The point behind TAS is to allow a process to test if a flag is set or
>>clear, and set it no matter what the result. But why does the test have
>>to be in the same instruction? 
>
>The example given by Mr. Goodenough in fact incorporates the changing of 
>the state of the flag in one instruction (the add-with-carry).  It is
>thus true that the sequence is unbreakable *at the OS level*: a normal
>OS will not reschedule while a task is in the middle of an instruction,
>because most CPU's won't allow interrupts in the middle of an instruction.
>(Note that this is not necessarily the case.)  A real TAS instruction
>often comes with the proviso that the bus cycles used to fetch and store
>are not interruptable either.  This guarantee is necessary in a multi-
>processor environment.
>
>I think that the x86 (x>0) series locks the bus on all XCHG instructions.
>The original chips required a LOCK prefix.  I don't know whether or not
>the LOCK is honored with other read/write instructions.

Actually, the LOCK prefix was somewhat more powerful than orignally
intended in initial 8086 family products.  One could use the LOCK prefix
before the REP prefix to build a locked string operation!  Since these could
be up to 64K iterations long and since the 8086 isn't that fast, it was
theoretically possible to lock other processors from the bus for extended
periods of time.

There is another reason why atomic operations are useful:  whenever
there is some modicum of peripheral intelligence (as is commonly found
with modern LAN controller chips), there arise cases in which memory
discriptors need to be updated in a controlled fashion.  For example,
after building a packet in memory, the packet must be linked into the
controller's out going packet list.  Since the controller may be actively
transmitting at that instant or worse yet, may be traversing links in 
list to find the next packet, an atomic operation makes possible a "seamless"
insertion into the list.  However, relatively few systems are designed to
take advantage of this feature.

The interlocked exchange operation is perhaps the most common tool for
multiprocessor operation.  Using it, one can simulate the test-and-set
operation, the test-and-clear operation and through careful use of global
values, sequenced locks and integer semaphores become practical.  Some 
CPU families go out of their way to add interlocked operations.  The DG
MV series and the NSC 32000 have a list of instructions which operate in
this fashion.

BTW, the TAS instruction makes barrier synchronization much simpler.  Without
it, writing a ROM to handle 'n' processors coming out of reset at the same
time and trampling over each other would not be as easy.

Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for Myself alone.

>-- 
>Mike McNally                                    Lynx Real-Time Systems
>uucp: {voder,athsys}!lynx!m5                    phone: 408 370 2233
>
>            Where equal mind and contest equal, go.

davidsen@sungod.crd.ge.com (William Davidsen) (06/20/89)

In article <5742@lynx.UUCP> m5@lynx.UUCP (Mike McNally) writes:

| I think that the x86 (x>0) series locks the bus on all XCHG instructions.
| The original chips required a LOCK prefix.  I don't know whether or not
| the LOCK is honored with other read/write instructions.

  Specified to lock the bus until the next instruction is complete. This
is a reasonable way to allow multiple processors to use any appropriate
interlock. I don't really like the ADDC for flag testing, since some
logic paths may require a loop until free (for short term resources) and
something could overflow.

  Why was this posted to wizards instead of arch????
	bill davidsen		(davidsen@crdos1.crd.GE.COM)
  {uunet | philabs}!crdgw1!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/22/89)

In article <577@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
>So the bottom line is all you need is the ability to capture the state
>of a bit, and set it no matter what, all in one atomic instruction.

The key is that it be atomic; not all "add with carry" instructions are.
On the PDP-11, we used to use something like TST and INCB as the two
semaphore basic instructions; it was tricky due to the bus supporting
both byte and word transfers.

When generalizing to multiprocessor architectures, many designers seem
to have found "test and set" more suitable for the purposes of basic
synchronization than an arithmetic operation would be.

jmm@ecijmm.UUCP (John Macdonald) (07/22/89)

In article <577@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
> [quoted material deleted]
>
>I have never understood the need for a test and set instruction, when
>you can make do with adc (add with carry). Allow me to explain:
>
>The point behind TAS is to allow a process to test if a flag is set or
>clear, and set it no matter what the result. But why does the test have
>to be in the same instruction? In fact all that is needed is the ability
>to capture the state of a bit, setting it as you do the capture, and
>test it later. If you think about the following:
>
>	[example of multitasking use of ADC deleted]
>
>Note that Task 1 got interrupted between sampling the bit, and testing it,
>_BUT_IT_DIDN'T_MAKE_ANY_DIFFERENCE_ - the system still worked.
>
>So the bottom line is all you need is the ability to capture the state
>of a bit, and set it no matter what, all in one atomic instruction.

This is true for a single-processor multi-tasking situation.  There is a
stronger requirement for a multi-processor shared memory situation.  In
that case, there must be provision for the atomic instruction to:

1. Read and check the status of the old value.
2. Change to a (possibly) new value.
3. Write back the new value.

(the same as described above, plus:)

4. Ensure that no other processor can access the old value between
    steps 1 and 3!

In many processors, most read-modify-write instructions release their
access path to the memory during step 2 and then regain it for step 3.
This allows another processor to use the memory path without waiting.
In such processors, there is generally a small number of instructions
which are guaranteed to not release the memory path.  For example, on
the Motorola 68020, the TAS (test and set), CAS (compare and swap),
and CAS2 (compare and swap twice) instructions all lock the memory
bus for the duration of all of their accesses; while other instructions
(e.g. add immediate to memory) which have a read-modify-write pattern
do not.  This type of design trades off increased speed for the non-
locking operations against the reuirement that the programmer use one
of the locking instructions whenever there may be a multi-processor
simultaneous access to the datum.
-- 
John Macdonald