[comp.parallel] communication delays

jones%HERKY.CS.UIOWA.EDU@VM1.NODAK.EDU (Douglas Jones) (12/07/88)

[I found this on comp.theory.  Sorry to repost it for those readers, but
	I thought it was worth it.  We haven't gotten into the
	theoretical models much --- this might be the time
 Steve
]



In Hai-Ning Liu's theorynet note of Dec. 5, 1988, he asks:

> I am interested in results about communication delay between any two
> processors.  Note this is different from PRAM model.

I have long suspected that there is a natural law that communication delays
in a system with n components are O(log n).  As a proposed natural law, this
is subject to counterexample and refinement, but cannot be proven.

Note that most Parallel Random Access Memory models assume that access time
is constant, independent of memory size and number of processors, but that
real parallel random access memory cannot be built this way.  To see this,
imagine any fixed technology.  All known technologies build memory from
modules with a fixed capacity per module.  Even if there is only one processor,
that processor must have a way to send memory addresses and data to all of the
modules.  All known technologies have a limit on the fanout allowed on any
signal.  If there are more modules than this fanout limit, some kind of
repeaters or amplifiers must be used.  Say the fanout limit allows a maximum
of k modules to listen to any signal.  In general, this requires a k-ary tree
of amplifiers to take address or data signals from the processor to memory.
For a sufficiently large memory, the delays involved in this k-ary tree will
dominate the memory access time, so we have O(log n) access, with logarithms
to the base k.

A similar argument applies to the data path from memory to processors, but
here, the problem is the limited fanin of feasible multiplexors, with the
result that a multiplexor tree is needed to bring signals from modules to the
processor.

Similar arguments apply to n processors sharing one memory, and these arguments
directly underly the design of the butterfly switch.

In multiprocessor systems with message passing, one can make similar arguments
about the message routing hardware.

Note that all of these arguments rest on the key statement "All known
technologies".  I know of no way to prove that someone tommorow cannot find a
new technology that evades these limits; such a new technology might, after-all
rest on now unknown physical principles.  At this point, a digression into the
realm of Karl-Popper is appropriate, but not in this newsgroup.

					Douglas Jones
					jones@herky.cs.uiowa.edu

andy@cayuga.stanford.edu (Andy Freeman) (12/14/88)

In article <3781@hubcap.UUCP> jones%HERKY.CS.UIOWA.EDU@VM1.NODAK.EDU (Douglas Jones) writes:
>I have long suspected that there is a natural law that communication
delays >in a system with n components are O(log n).  As a proposed
natural law, this >is subject to counterexample and refinement, but
cannot be proven.

[Jones' law is based on fixed fan-out.]

If one assumes that device size is a constant independent of the total
number of devices and that the speed of light really is a limit on
communications speed, one can prove that communication latency must be
Omega(cube-root(n)).  (Omega is the lower bound, Big-O is the upper
bound.)

If one also assumes standard thermodynamics and that the power-
used/device/unit-time is a constant, the latency grows to
Omega(square-root(n)) because an arbitrarily large cube of heaters
can't be cooled (or powered, for that matter, but heat removal is
harder).  Of course, if the power-used/device/unit-time decreases as a
function of n (which might be the case for CMOS), this bound doesn't
apply.

There was an article in Scientific American some time ago on
arbitrarily low-power computers.  Unfortunately, they're arbitrarily
slow as well.

-andy
UUCP:  {arpa gateways, decwrl, uunet, rutgers}!polya.stanford.edu!andy
ARPA:  andy@polya.stanford.edu
(415) 329-1718/723-3088 home/cubicle

Donald.Lindsay@K.GP.CS.CMU.EDU (12/24/88)

In article <3882@hubcap.UUCP> andy@cayuga.stanford.edu (Andy Freeman) writes:
>If one assumes that device size is a constant independent of the total
>number of devices and that the speed of light really is a limit on
>communications speed, one can prove that communication latency must be
>Omega(cube-root(n)).  (Omega is the lower bound, Big-O is the upper bound.)
>If one also assumes standard thermodynamics and that the power-
>used/device/unit-time is a constant, the latency grows to
>Omega(square-root(n)) because an arbitrarily large cube of heaters
>can't be cooled ...

This applies to single processors.  A MIMD machine, of course, can grow
without any reduction in the speed of each node. What suffers is the
communications latency between (physically) nonadjacent nodes.  Some
programs are insensitive to this, and can be scaled to a World Machine
(as factoring has).

Another limit is component reliability.  If we generously assume complete
fault tolerance, then Mean Time Between Failure must equal (Mean Time To
Repair)/(number of repairmen). If MTBF were smaller than MTTR/Nr, then
the repair crew would be falling further and further behind. The
shrinking amount of hardware would have an increasing MTBF, until we
reach equilibrium at MTBF = MTTR/Nr.

Today a system is "large" if it has 10^4 or 10^5 chips. I guess the
Cray-3 design at 10^5 chips and over 10^6 pinouts (16 CPUs in one cubic
foot). I don't believe it can be fixed while running, so the MTBF is an
interesting question.

Kung has argued that MIMD designs must scale communications linearly with
computation (and memory either linearly or as the square, depending on
the algorithms of interest).  So, as clock rates rise, the bandwidth
between cabinets must go up.

Another limit for homogenous MIMD designs is the wiring pattern. The TF-1
design (32K CPUs) has a cylinder of cables, 40 feet across and 63 layers
(6 feet) deep. This pattern has essentially been designed to its limit.

Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science