[comp.arch] Re^4: Superlinear Speedup

sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan) (06/05/90)
csimmons@jewel.oracle.com (Charles Simmons) writes:
>In article <1990Jun3.145408.2472@watmath.waterloo.edu>,
>sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan) writes:
[me claiming that super-linear speedup is impossible except for enhanced
memory bandwidth]

>Yah...  Hopefully obviously, I'm not real interested in the speedup
>effects caused by having more caches and more main memory bandwidth.

>Also, obviously, I can simulate any multiprocessor algorithm on a
>single processor by using time sharing techniques.  Clearly, adding
>the time sharing techniques would cause a certain amount of overhead
>to occur.  So I'm still interested in the question of:  To what extent
>have researchers found algorithms that run much faster when you explore
>multiple possible solutions at the same time.

If you're going to stick to just algorithms, then the theorum definately
holds:  it is NOT POSSIBLE to get super-linear speedup.  CAN'T HAPPEN.

Thumnail proof:  Proof by contradiction; assume that there exists an
algorithm that exibits some super-linear speedup on P processors.  That
means that the best possible time for this problem on a uniprocessor is
O(s(n)) and on a paralle processor is O(p(n)), such that

	O(p(n)) < O(s(n)/P)

for P parallel processors.

Now I simulate this parallel algorithm on my uni-processor system using
time-slicing to simulate parallelism.  My context switching overhead is
some linear function of c(s(n)) (I context switch c% of the time).  So
now my uniprocessor time complexity for this problem is now

	O(c(p(n))) = O(p(n))	since c is a linear factor.

Now my sequential performance is within a linear factor of my parallel
performance, which means that you no longer have super-linear speedup.
Contradiction, QED, etc.

What might be observed, and is occasionally used in sales literature (as
explained by Hennessy & Pattersonin in their new book) is super-linear
slow-down.  Super-linear slowdown is, as you describe, running a
parallel algorithm on one processor (or at least fewer than it can
exploit).  Then the algorithm may run more than P times slower than the
fully parallel version due to context switching overhead.  What is even
more likely, however, is that the parallel algorithm is un-able to make
use of some information that a serial algorithm can use, because that
important information is being generated concurrently, and may not be
ready when you need it.  This was explained by another poster with
regard to the alpha-beta search problem.

Summary:  In the realm of algorithms, asking for observations of
super-linear speedup is like asking for observations of
faster-than-light travel, only less likely.  In the realm of
architecture, super-linear speedup can occur due to interesting
noise-like effects involving memory bandwidth, but this is unlikely to
exceed linear speedup in any substantial way.  Most parallel systems
don't even approach linear speedup.  This is the holy grail; scalable,
general parallel linear speedup.

----------------------------------------------------------------------
(S.) Crispin Cowan, CS grad student, University of Waterloo
Office:		DC3548	x3934		Home phone: 570-2517
Post Awful:	60 Overlea Drive, Kitchener, N2M 1T1
UUCP:		watmath!watmsg!sccowan
Domain:		sccowan@watmsg.waterloo.edu

"You can have peace.  Or you can have freedom.  Don't ever count on
having both at once."
	-Lazarus Long
"You can't separate peace from freedom because no one can be at peace
unless he has his freedom."
	-Malcolm X