lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (09/22/87)
[Sorry if this repeats a previous post: I am assuming it got lost. ] To recap: Amdahl's Law says that a parallel computation cannot run any faster than its inherently sequential portion. Answer #1: reformulate. For example, at least one supercomputer (the S1 ? ) has instruction set support for recurrences. But, a good mathematician may be able to map your recurrence relation onto another problem which has a nonrecurrent solution (that is then mapped back). Even better is to discover that you didn't need the answer. Einstein's coworkers report that he would phone them and say, "You know, I've been thinking, and we don't really need to do this." Answer #2: replicate. Don't solve one problem faster: solve N problems at once. This doesn't help the "time to solution" but it's surely cheaper. Answer #3: better algorithm. If I could summarize this topic in one paragraph, I would write "Secrets of the Cray Masters" and retire. Answer #3.1: amortize. For example, it would seem that you must spend at least one instruction/object to deallocate a piece of memory. However, a garbage collector which waits until memory is mostly garbage, can just copy the good stuff down to the end, and then announce that the rest is garbage. Under the right conditions (like, a gigabyte workstation) this could cost <<less>> than one instruction-time/object. Answer #3.2: dilute. Cleve Moler at Intel Scientific Computers defines the "Amdahl fraction" alpha to be the time-fraction of an algorithm which is "not parallelizable". He claims that alpha is often a decreasing function of problem size, and goes to zero as problem size goes to infinity. He refers to these as "effective algorithms" and has examples. Answer #3.3: loop restructuring. Today's vectorizing compilers can flip two nested loops so that they compute the same result, but have a vectorizable inner loop. Answer #3.4: trace scheduling. This is just the latest wrinkle in a long line of clever compiler optimizations, which collectively reduce the amount of computation (hoisting things out of loops, computing repeated expressions once ) or keep the hardware busier (prefetching, scheduling). Scheduling is a win because it finds usable paralellism. Answer #3.5: precompute. For example, don't call a routine for sin(x) - have a sine table. This can be carried much farther by e.g. generating special purpose programs instead of using general-purpose ones. Symbolic manipulation systems occasionally win big. Answer #3.6: clairvoyant circuits. Rumored to have been used by occasional desperate graduate students. -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science