[comp.hypercube] Answers to Amdahl's Law

lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (09/22/87)
[Sorry if this repeats a previous post: I am assuming it got lost. ]

To recap: Amdahl's Law says that a parallel computation cannot run any
faster than its inherently sequential portion.

Answer #1: reformulate. For example, at least one supercomputer (the S1 ? )
has instruction set support for recurrences.  But, a good mathematician may
be able to map your recurrence relation onto another problem which has a
nonrecurrent solution (that is then mapped back). Even better is to discover
that you didn't need the answer. Einstein's coworkers report that he would
phone them and say, "You know, I've been thinking, and we don't really need
to do this."

Answer #2: replicate. Don't solve one problem faster: solve N problems at
once. This doesn't help the "time to solution" but it's surely cheaper.

Answer #3: better algorithm. If I could summarize this topic in one
paragraph, I would write "Secrets of the Cray Masters" and retire.

Answer #3.1: amortize. For example, it would seem that you must spend at
least one instruction/object to deallocate a piece of memory. However, a
garbage collector which waits until memory is mostly garbage, can just copy
the good stuff down to the end, and then announce that the rest is garbage.
Under the right conditions (like, a gigabyte workstation) this could cost
<<less>> than one instruction-time/object.

Answer #3.2: dilute. Cleve Moler at Intel Scientific Computers defines the
"Amdahl fraction" alpha to be the time-fraction of an algorithm which is
"not parallelizable". He claims that alpha is often a decreasing function of
problem size, and goes to zero as problem size goes to infinity.  He refers
to these as "effective algorithms" and has examples.

Answer #3.3: loop restructuring. Today's vectorizing compilers can flip two
nested loops so that they compute the same result, but have a vectorizable
inner loop.

Answer #3.4: trace scheduling. This is just the latest wrinkle in a long
line of clever compiler optimizations, which collectively reduce the amount
of computation (hoisting things out of loops, computing repeated expressions
once ) or keep the hardware busier (prefetching, scheduling). Scheduling
is a win because it finds usable paralellism.

Answer #3.5: precompute. For example, don't call a routine for sin(x) - have
a sine table. This can be carried much farther by e.g. generating special
purpose programs instead of using general-purpose ones. Symbolic manipulation
systems occasionally win big.

Answer #3.6: clairvoyant circuits. Rumored to have been used by occasional
desperate graduate students.


-- 
	Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science