[comp.arch] Will the Karp Problem Be Solved?

dgreen@zeus.cs.ucla.edu (Dan R. Greening) (12/05/86)

Reprinted without permission from the November 1986 SIAM News.  SIAM
is the Society for Industrial and Applied Mathematics.


WILL THE KARP PROBLEM BE SOLVED?

The following item has been circulating for almost a year on various
computer networks.  Because the author of the challenge is quite
confident about not having to pay any winner, SIAM News decided that
to really test the problem we should help to give the challenge wider
circulation.  The challenger is Alan Karp, IBM Scientific Research
Center 1530 Page Mill Road, Palo Alto, CA  94304

---

  I attended in 1985 the Second SIAM Conference on Parallel Processing
for Scientific Computing in Norfolk, Virginia, where I heard about
1,000 processor systems, 4,000 processor systems, and even a proposed
1,000,000 processor system.  Since I wonder if such systems are the
best way to do general-purpose, scientific computing, I am making the
following offer:

  I will pay $100 to the first person to demonstrate a speed-up of at
least 200 on a general purpose, MIMD computer used for scientific
computing.  This offer will be withdrawn at 11:59 PM on 31 December
1995.

  Some definitions are in order.
  SPEED-UP:  By speed-up I mean the time taken to run an application
on one processor using the best sequential algorithm divided by the
time to run the same application on N processors using the best
parallel algorithm.
  GENERAL PURPOSE:  The parallel system should be able to run a wide
variety of applications.  For the purposes of this test, the machine
must run three different programs that demonstrate its ability to
solve different problems.  I suggest a large, dynamic structures
calculation, a trans-sonic fluid flow past a complex barrier, and a
large problem in econometric modeling.  These are only suggestions;  I
will be quite flexible in the choice of the problems.  Several people
have volunteered to adjudicate disputes in the selection of the
applications.
  APPLICATIONS:  The problems run for this test must be complete
applications--no computational kernels are allowed.  They must contain
all input, data-transfers from host to parallel processors, and all
output.  The problems chosen should be the kind of job that a working
scientist or engineer would submit as a batch job to a large
supercomputer.  In addition, I am arbitrarily disqualifying all
problems that Cleve Moler calls ``embarrassingly parallel.''  These
include signal processing with multiple independent data streams, the
computation of the Mandelbrot set, etc.
  There are some rather obvious ground rules.
  1. Simulations of the hardware are not permitted.  I am looking for
a demonstration on a running piece of hardware.
  2. The same problem should be run on the sequential and parallel
processors.
  3. It is not fair to cripple the sequential processor.  For example,
if your operating system uses 99% of one processor, the single
processor run will spend all its time in the operating system.
Super-linear speed-up as the number of processors is increased is
evidence of this problem.
  4. It is not fair to memory starve the single processor run.  If you
have 100K words of memory on each of 1,000 processors, and you run a
10 MW problem, it is not fair for the sequential run to be made on a
100 KW processor.  After all, we are not interested in seeing the
impact of additional memory; we want to see how much the extra CPUs
help.

  It may not be possible to follow all these additional rules.  For
example, you can't be expected to build a single processor with lots
of extra memory or write a new operating system just for this test.
However, you should do your best to make the demonstration fair.  A
third party has agreed to adjudicate any scaling disputes.
  Anyone claiming this prize will be expected to present his or her
results at a suitable conference.  At the end of the presentation, I
will have the audience vote on whether the demonstration was
successful.  Remember, the purpose of the bet is to force developers
to demonstrate the utility of massively parallel systems, not to show
they can find holes in the rules.

Alan Karp is on the staff of IBM's Scientific Center in Palo Alto.
His challenge is a personal one, and does not reflect the position of
his employer.

----

While I can think of easier ways to earn a cool hundred, the challenge
seems to involve all sorts of people, so I thought I would pass it on
to you.

Well, Computer Scientists ...

  (hands on hips, looking down, frowning, tapping foot)

... why HAVEN'T we gotten the hundred bucks yet?


 _D_a_n_ _G_r_e_e_n_i_n_g     _A_R_P_A-  dgreen@CS.UCLA.EDU
                  _U_U_C_P- ..!{sdcrdcf,ihnp4,trwspp,ucbvax}!ucla-cs!dgreen

bs@linus.UUCP (Robert D. Silverman) (12/06/86)

At the Crypto '86 conference I presented a new variation of the
Quadratic Sieve algorithm for factoring large integers. The variation
has been programmed on a STAR configuation of SUN-3's using ethernet
connections. I can demonstrate that N machines run N times as fast
(less perhaps 1-2%) as 1 machine. The algorithm requires miniscule I/O
between the central host and its satellites and it partitions itself
in a natural way such that each satellite can work totally independently
of the others. 

I don't have 200 SUN-3's to hook together, however.
 
I have submitted a paper describing the work to the Journal of
Supercomputing.
 

Bob Silverman

ian@loral.UUCP (12/10/86)

In article <144@linus.UUCP> bs@linus.UUCP (Robert D. Silverman) writes:
>
>
>At the Crypto '86 conference I presented a new variation of the
>Quadratic Sieve algorithm for factoring large integers. The variation
>has been programmed on a STAR configuation of SUN-3's using ethernet
>connections. 

  Although this many be an interesting parallel algorithm, it is far
  from meeting Karp's challenge.  The point of Karp's challenge is that
  you must have a _general_ purpose parallel programming system - at least
  general purpose in the domain of numeric computation.  The programs that
  Dr. Karp proposes are much more complex than the program mentioned above.

  I have written parallel sorts and matrix multiplies for the Loral
  dataflow processor, but these are toy programs compared to what Karp
  proposes.  One of the problems with Karp's challenge (as others have
  pointed out) is that it will not be met unless someone foots the bill for
  a rather prodigious software development effort.  For parallel machines
  of the scale that Karp's challenge covers (e.g., 100 or more processing
  elements) software development is harder than it is for sequential
  processors.  For example, it took me a week to get a parallel matrix
  multiply program written and debugged on the Loral dataflow system.
  On a sequential processor this would be a trivial task.

  I believe that better programming environments will emerge, but they are
  a few years away.  Karp's challenge will be met, if it is met at all, as
  a side effect of another project.

		     Ian Kaplan
		     Loral Dataflow Group
		     Loral Instrumentation
	     USENET: {ucbvax,decvax,ihnp4}!sdcsvax!loral!ian
	     ARPA:   sdcc6!loral!ian@UCSD
	     USPS:   8401 Aero Dr. San Diego, CA 92123

segall@caip.UUCP (12/10/86)

> >At the Crypto '86 conference I presented a new variation of the
> >Quadratic Sieve algorithm for factoring large integers. The variation
> >has been programmed on a STAR configuation of SUN-3's using ethernet
> >connections. 


>   Although this many be an interesting parallel algorithm, it is far
>   from meeting Karp's challenge.  The point of Karp's challenge is that
>   you must have a _general_ purpose parallel programming system - at least
>   general purpose in the domain of numeric computation.  The programs that
>   Dr. Karp proposes are much more complex than the program mentioned above.

Another point is that this doesn't even fit into the class of problems
which Karp has allowed. He only wants to consider solutions to
problems which are not "trivially parallelizable". Being trivially
parallelizable does not mean that a problem is trivial, but that there
is a way to execute a parallel solution to it which is no more
involved than simply using a bunch of processors independantly, or
almost independantly. Your solution seems to fit into that category. I
would think that any problem of useful size whose communication can be
handled without significant penalty by a single ethernet connecting
some Suns together has trivial communication requirements.


I agree with Ian that any system which meets this challenge will be
part of a large project, and will only incidentally be entered in the
contest. However, it also serves as a little bit of inspiration to
those of us who hope to eventually meet the challenge. 

rb@cci632.UUCP (Rex Ballard) (12/12/86)

The Karp challenge, at first looks like a very simple challenge
to beat.  But after looking at the rules more carefully, I could
say it's a "rigged game".

There are two analogies here.  In the first analogy, the most
popular, you are building a house.  If it takes 1 man 1 year
to build the house, but it takes 2 men 6 months to build the
house, then it should take 365 men 1 day to build the house,
and 3650 men, little more than an hour to build the house.
Karp has generously not limited the number of processors
required to get his 100 fold increase, but I can referr to
the "mythical man-month" for the problems with such increases.

In the other anology, we have an assembly line.  In this case,
it does become possible to organize a group more efficiently,
and productivity does go up.  Because all of the workers are
able to do work at the same time, and time is saved searching
for tools, it is "theoretically possible" for the line to
build a car that takes 1 man-year to build in an average of
1 day.  This is because many portions of the pipeline are
also parallel, for example, the four wheels can be mounted
at the same time.

Karps test however, limits the number of tools available,
and then asks for the actual time required to build 1 car.
While it might still be possible to build the car in a month,
because only one "cylinder maker" is allowed,..., even with
365 men, the increase will not approach 100-fold.

Our company, like several others, does have distributed processing
networks incorporating well over 1000 processors, and yet the
speed-up factor for a single job, running through the network,
is not significantly faster when compared to the actual execution
time (less OS overhead) of a single processor system.  The important
difference is that the system can handle 1000 jobs before serious
degradation over single job performance occurrs.

If one carefully reads the rules, they will find that Karps test
is quite simply a "rigged game".

eugene@pioneer.arpa (Eugene Miya N.) (12/13/86)

Just a short note.
I am taking the liberty to forward all the postings about the Karp
Challenge to Alan Karp (karp@ibm.com).  If you care to read about
the challenge, send mail to netlib@anl-mcs.arpa (I assume you know how
to use netlib) using the keywords benchmark and karp-challenge.

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene