jones%HERKY.CS.UIOWA.EDU@VM1.NODAK.EDU (Douglas Jones) (12/07/88)
[I found this on comp.theory. Sorry to repost it for those readers, but I thought it was worth it. We haven't gotten into the theoretical models much --- this might be the time Steve ] In Hai-Ning Liu's theorynet note of Dec. 5, 1988, he asks: > I am interested in results about communication delay between any two > processors. Note this is different from PRAM model. I have long suspected that there is a natural law that communication delays in a system with n components are O(log n). As a proposed natural law, this is subject to counterexample and refinement, but cannot be proven. Note that most Parallel Random Access Memory models assume that access time is constant, independent of memory size and number of processors, but that real parallel random access memory cannot be built this way. To see this, imagine any fixed technology. All known technologies build memory from modules with a fixed capacity per module. Even if there is only one processor, that processor must have a way to send memory addresses and data to all of the modules. All known technologies have a limit on the fanout allowed on any signal. If there are more modules than this fanout limit, some kind of repeaters or amplifiers must be used. Say the fanout limit allows a maximum of k modules to listen to any signal. In general, this requires a k-ary tree of amplifiers to take address or data signals from the processor to memory. For a sufficiently large memory, the delays involved in this k-ary tree will dominate the memory access time, so we have O(log n) access, with logarithms to the base k. A similar argument applies to the data path from memory to processors, but here, the problem is the limited fanin of feasible multiplexors, with the result that a multiplexor tree is needed to bring signals from modules to the processor. Similar arguments apply to n processors sharing one memory, and these arguments directly underly the design of the butterfly switch. In multiprocessor systems with message passing, one can make similar arguments about the message routing hardware. Note that all of these arguments rest on the key statement "All known technologies". I know of no way to prove that someone tommorow cannot find a new technology that evades these limits; such a new technology might, after-all rest on now unknown physical principles. At this point, a digression into the realm of Karl-Popper is appropriate, but not in this newsgroup. Douglas Jones jones@herky.cs.uiowa.edu
andy@cayuga.stanford.edu (Andy Freeman) (12/14/88)
In article <3781@hubcap.UUCP> jones%HERKY.CS.UIOWA.EDU@VM1.NODAK.EDU (Douglas Jones) writes: >I have long suspected that there is a natural law that communication delays >in a system with n components are O(log n). As a proposed natural law, this >is subject to counterexample and refinement, but cannot be proven. [Jones' law is based on fixed fan-out.] If one assumes that device size is a constant independent of the total number of devices and that the speed of light really is a limit on communications speed, one can prove that communication latency must be Omega(cube-root(n)). (Omega is the lower bound, Big-O is the upper bound.) If one also assumes standard thermodynamics and that the power- used/device/unit-time is a constant, the latency grows to Omega(square-root(n)) because an arbitrarily large cube of heaters can't be cooled (or powered, for that matter, but heat removal is harder). Of course, if the power-used/device/unit-time decreases as a function of n (which might be the case for CMOS), this bound doesn't apply. There was an article in Scientific American some time ago on arbitrarily low-power computers. Unfortunately, they're arbitrarily slow as well. -andy UUCP: {arpa gateways, decwrl, uunet, rutgers}!polya.stanford.edu!andy ARPA: andy@polya.stanford.edu (415) 329-1718/723-3088 home/cubicle
Donald.Lindsay@K.GP.CS.CMU.EDU (12/24/88)
In article <3882@hubcap.UUCP> andy@cayuga.stanford.edu (Andy Freeman) writes: >If one assumes that device size is a constant independent of the total >number of devices and that the speed of light really is a limit on >communications speed, one can prove that communication latency must be >Omega(cube-root(n)). (Omega is the lower bound, Big-O is the upper bound.) >If one also assumes standard thermodynamics and that the power- >used/device/unit-time is a constant, the latency grows to >Omega(square-root(n)) because an arbitrarily large cube of heaters >can't be cooled ... This applies to single processors. A MIMD machine, of course, can grow without any reduction in the speed of each node. What suffers is the communications latency between (physically) nonadjacent nodes. Some programs are insensitive to this, and can be scaled to a World Machine (as factoring has). Another limit is component reliability. If we generously assume complete fault tolerance, then Mean Time Between Failure must equal (Mean Time To Repair)/(number of repairmen). If MTBF were smaller than MTTR/Nr, then the repair crew would be falling further and further behind. The shrinking amount of hardware would have an increasing MTBF, until we reach equilibrium at MTBF = MTTR/Nr. Today a system is "large" if it has 10^4 or 10^5 chips. I guess the Cray-3 design at 10^5 chips and over 10^6 pinouts (16 CPUs in one cubic foot). I don't believe it can be fixed while running, so the MTBF is an interesting question. Kung has argued that MIMD designs must scale communications linearly with computation (and memory either linearly or as the square, depending on the algorithms of interest). So, as clock rates rise, the bandwidth between cabinets must go up. Another limit for homogenous MIMD designs is the wiring pattern. The TF-1 design (32K CPUs) has a cylinder of cables, 40 feet across and 63 layers (6 feet) deep. This pattern has essentially been designed to its limit. Don lindsay@k.gp.cs.cmu.edu CMU Computer Science