[comp.parallel] Memory access conflicts

cik@l.cc.purdue.edu (Herman Rubin) (08/29/88)

There are situations in which it is likely that processors operating in
parallel may have to address the same item in memory at the same time.
What are the consequences in program time for conflicts of this type?
What hardware features are there to possibly circumvent this problem if
it is serious?
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

fpst@hubcap.clemson.edu (Steve Stevenson-Moderator) (08/29/88)

> There are situations in which it is likely that processors operating in
> parallel may have to address the same item in memory at the same time.
> What are the consequences in program time for conflicts of this type?
> What hardware features are there to possibly circumvent this problem if
> it is serious?

This is a very old and serious problem.  It has occupied an lot of
pages in the performance literature.  Hardware folks became aware of it
early on.

Given your ``residence'' you might try Coffman and Denning "Operating
System Theory" --- old but has all the queuing stuff in  it.  

There are probably newer treatments, but the problem's the same.

Steve Stevenson                            fpst@hubcap.clemson.edu
(aka D. E. Stevenson),                     fpst@prism.clemson.csnet
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

dfk@grad4.cs.duke.edu (David F. Kotz) (08/29/88)

In article <2886@hubcap.UUCP>, cik@l.cc.purdue.edu (Herman Rubin) writes:
> There are situations in which it is likely that processors operating in
> parallel may have to address the same item in memory at the same time.
> What are the consequences in program time for conflicts of this type?
> What hardware features are there to possibly circumvent this problem if
> it is serious?

The consequences of memory conflicts are quite serious and can
completely bog down a parallel program. The results depend strongly on
the particular architecture and, of course, the implementation of the
algorithm. 

Usually the parallel processes in a program need not access the same
memory location to conflict, as long as they access the same memory
*module*. Standard memory modules can only service one memory access
at a time. Thus all accesses to that module are serialized. Most
parallel computers (and many uniprocessor computers) have several
memory modules, so the best is for the processors spread out their
accesses among the memories as much as possible, and to avoid
accessing memory when not necessary. 

Further conflicts come from the communication network used to connect
the processors to the memories. In a bus-based architecture, all of
the memory accesses go through the single bus and are hence serialized
at that point, providing an even worse bottleneck than serialization
at the memory module. Other architectures include omega (Butterfly)
networks, crossbar switches, and hypercubes. 

I have personally seen extreme cases of memory conflicts in a BBN
Butterfly Parallel processor, not only from an program banging on one
particular memory location (a shared counter or central pointer) but
also due to misbalanced access to the memories of the processors. 

One hardware alternative is the combining network, which was planned
to be used in the NYU Ultracomputer. Memory accesses through this
hierarchical network are *combined* when they refer to the same memory
location, so only one access reaches the memory module itself, and are
re-expanded on the way back through the network. Multiple reads to a
module would effect one read at the module and send the value back to
all requesting processors. Multiple writes would arbitrarily choose
one of the values to be the "last" one written and write that value in
the location. Fetch-and-add is another, more complex operation, that
accomplishes x=x+y atomically and also uses the combining feature.

Some other reading:
@book {stone:book,
    author = "Harold S. Stone",
    title = "High Performance Computer Architecture",
    year = 1987,
    publisher = "Addison-Wesley"
}

@article{gottlieb:nyu,
	author = "Allan Gottlieb and Ralph Grishman and Clyde P. Kruskal and
Kevin P. McAuliffe and Larry Rudolph and Marc Snir",
	title = "The {NYU Ultracomputer} --- {Designing} an {MIMD} Shared
Memory Parallel Computer",
	journal = ieeetc,
	year = 1983,
	month = feb,
	volume = "C-32",
	number = 2,
	pages = "175--189"
}

David Kotz
Department of Computer Science, Duke University, Durham, NC 27706
ARPA:	dfk@cs.duke.edu
CSNET:	dfk@duke        
UUCP:	decvax!duke!dfk