[comp.arch] H/W Write Buffers, S/W Synchronizat

aglew@ccvaxa.UUCP (10/31/87)

/* Written  1:28 pm  Oct 26, 1987 by viggy@hpsal2.HP.COM in ccvaxa:comp.arch */
/* ---------- "H/W Write Buffers, S/W Synchronizat" ---------- */


    Hardware Write Buffers and Software Synchronization

I have been looking into a problem involving software synchronization
using shared variables in tightly coupled private cache multiprocessor
systems with hardware write buffers.

I would like to hear other people's experiences with the problem: should
an architecture allow the performance advantage of write buffers and
restrict the way software synchronizes?

Hardware write (store) buffers provide queueing to smooth out the total
instruction flow by allowing the execution unit to proceed in spite of
unpredictable delays caused by the storage unit (cache miss).  In a
shared memory, private cache (write back) multiprocessor system, a
write buffer can cause temporary staleness of data.  If such data is
being shared between processes that are executing on different processors,
as in the example below, there can be serious problems with inconsistencies,
or deadlocks.

             Master                              Slave

Create work;                           Consume work;
Block;                                 Completed++;
available++;                           if (completed < available)
if ((available - completed) > 1)           wakeup(master);
    sleep;                             else sleep;
else
    wakeup (slave);

In this example, synchronization is accomplished through modification of
shared variables 'available', and 'completed'.  Changes to these variables
are not instantaneously visible in the other processor modules.  This causes
caches to become temporarily stale, which causes the problem - both master
and slave go to sleep forever.

The question is not "how to synchronize with write buffers", but rather the
follwoing:

  1.  How much code already uses this?

  2.  Is it difficult to write software with such a restriction?, and

  3.  Would it be appropriate to force software writers to identify shared
      variables?

John Mashey, are you listening?

Viggy Mokkarala (hplabs!hpda!viggy)
(408)447-5983
19420 Homestead Road, Cupertino, CA 95014.
/* End of text from ccvaxa:comp.arch */

aglew@ccvaxa.UUCP (11/02/87)

...> Write buffering

There are already multiprocessor systems out there that have write buffering
(even though the cache may be write through, it doesn't mean that the write
immediately gets to memory). It seems that there are a lot of algorithms
that don't really need an immediately consistent view of the data - they
just need _eventually_ consistent data.

(I thought that the term "eventually consistent" was my own invention until
I heard a guy from Xerox PARC give a talk on it, wrt. to networked databases.
I've been using it wrt to caches and memory systems. Same idea, different
scale.)

Some other posters have talked about read-modify-writes in connection with
write buffering. They are not quite the same issue. Consider:

	Processor 1		Processor 2
	TSET  L			...
	STORE 1,A	     g: TSET L
	STORE 2,A		BNZ  G
	TCLR  L			LOAD R1 <- A

The example is contrived - what I want is Processor 1 doing a series of writes,
and then clearing a lock; Processor 2 acquiring a lock, and then looking at the
data structure.
    You might conceivably let the test and set bypass the memory queues, in
which case R1 might be loaded with 1 instead of 2.
    Or, you can require sequential semantics, so the TSET waits until all
other processors' writes have gone through; equivalently, you might stylize
and say that TCLR waits until all of this processor's writes have gone through.
    Except that waiting until the write buffers have emptied might take a long
time, especially on a system with several layers of cache, and it might require
a lot of expensive interprocessor communication. Requiring memory
on all lock activities penalizes a lot of algorithms where the semaphore
is actually the communications channel, not protecting other data structures.
    So, there should be locks that wait until memory is synchronized, and
ones that don't.

Also, because both locking and memory synchronization may take a long time,
these activities should be split up so that optimistic algorithms can be
used. Eg.
    START-SYNCHRONIZING-MEMORY
		FROM-OTHER-PROCESSORS
		FROM-THIS-PROCESSOR
    WAIT-UNTIL-SYNCHRONIZED
So you can start the expensive operation as soon as you have written the stuff
that needs to be synchronous, but keep doing other work while it proceeds.


Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms.arpa

I always felt that disclaimers were silly and affected, but there are people
who let themselves be affected by silly things, so: my opinions are my own,
and not the opinions of my employer, or any other organisation with which I am
affiliated. I indicate my employer only so that other people may account for
any possible bias I may have towards my employer's products or systems.