[comp.parallel] Super-unitary speedup

ephraim@THINK.COM (12/12/88)
After following the discussion of super-linear performance improvement
(or its absence) on parallel machines, it struck me that I've been
sitting on a possible instance.  It hadn't occurred to me to publish,
perhaps because it's a real application and not a research project
:-).

The task is looking up words in a dictionary.  Like most dictionaries,
the dictionary contents are sorted in alphabetical order.  Like most
words, the words to be looked up are not.  The algorithm used is a
variation of binary search called Parallel Binary Search (PBS) which
searches for multiple words at once.  PBS was devised by Craig
Stanfill and others at Thinking Machines.  The algorithm was debugged
and implemented by yours truly.  Needless to say, the algorithm is
designed for efficient implementation on a Connection Machine.

We have a dictionary of D unique entries memory-resident in a P
processor Connection Machine.  We have a total of W words to look up.
In practice, D >> P and W >> P.  That is, there are several dictionary
entries in each processor (D/P) and there are more words to look up
than can be accomodated in one pass.  Each use of PBS can look up 1 <=
w <= P words in the dictionary, so at least W/P calls are needed to
look up all of the words.

The cost of PBS is nearly independent of w, the number of words looked
up in one call.  One component of the cost per call is proportional to
D/P, another is proportional to log2(P).  So in practice, we use w ~=
P for greatest efficiency.

Consider what happens as P is increased.  The number of calls to PBS
is reduced in direct proportion to P, giving us a nearly linear
performance improvement.  (It's slightly sub-linear because of the
log2(P) term.)  But, D/P is also decreased, giving us an additional
improvement.

In an experiment carried out in mid-September, I had the following
results:

    Machine size (P)   CPU time
         8192            165 S
        16384             54 S
        32768             20 S