sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan) (06/05/90)
csimmons@jewel.oracle.com (Charles Simmons) writes: >In article <1990Jun3.145408.2472@watmath.waterloo.edu>, >sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan) writes: [me claiming that super-linear speedup is impossible except for enhanced memory bandwidth] >Yah... Hopefully obviously, I'm not real interested in the speedup >effects caused by having more caches and more main memory bandwidth. >Also, obviously, I can simulate any multiprocessor algorithm on a >single processor by using time sharing techniques. Clearly, adding >the time sharing techniques would cause a certain amount of overhead >to occur. So I'm still interested in the question of: To what extent >have researchers found algorithms that run much faster when you explore >multiple possible solutions at the same time. If you're going to stick to just algorithms, then the theorum definately holds: it is NOT POSSIBLE to get super-linear speedup. CAN'T HAPPEN. Thumnail proof: Proof by contradiction; assume that there exists an algorithm that exibits some super-linear speedup on P processors. That means that the best possible time for this problem on a uniprocessor is O(s(n)) and on a paralle processor is O(p(n)), such that O(p(n)) < O(s(n)/P) for P parallel processors. Now I simulate this parallel algorithm on my uni-processor system using time-slicing to simulate parallelism. My context switching overhead is some linear function of c(s(n)) (I context switch c% of the time). So now my uniprocessor time complexity for this problem is now O(c(p(n))) = O(p(n)) since c is a linear factor. Now my sequential performance is within a linear factor of my parallel performance, which means that you no longer have super-linear speedup. Contradiction, QED, etc. What might be observed, and is occasionally used in sales literature (as explained by Hennessy & Pattersonin in their new book) is super-linear slow-down. Super-linear slowdown is, as you describe, running a parallel algorithm on one processor (or at least fewer than it can exploit). Then the algorithm may run more than P times slower than the fully parallel version due to context switching overhead. What is even more likely, however, is that the parallel algorithm is un-able to make use of some information that a serial algorithm can use, because that important information is being generated concurrently, and may not be ready when you need it. This was explained by another poster with regard to the alpha-beta search problem. Summary: In the realm of algorithms, asking for observations of super-linear speedup is like asking for observations of faster-than-light travel, only less likely. In the realm of architecture, super-linear speedup can occur due to interesting noise-like effects involving memory bandwidth, but this is unlikely to exceed linear speedup in any substantial way. Most parallel systems don't even approach linear speedup. This is the holy grail; scalable, general parallel linear speedup. ---------------------------------------------------------------------- (S.) Crispin Cowan, CS grad student, University of Waterloo Office: DC3548 x3934 Home phone: 570-2517 Post Awful: 60 Overlea Drive, Kitchener, N2M 1T1 UUCP: watmath!watmsg!sccowan Domain: sccowan@watmsg.waterloo.edu "You can have peace. Or you can have freedom. Don't ever count on having both at once." -Lazarus Long "You can't separate peace from freedom because no one can be at peace unless he has his freedom." -Malcolm X