dgreen@zeus.cs.ucla.edu (Dan R. Greening) (12/05/86)
Reprinted without permission from the November 1986 SIAM News. SIAM is the Society for Industrial and Applied Mathematics. WILL THE KARP PROBLEM BE SOLVED? The following item has been circulating for almost a year on various computer networks. Because the author of the challenge is quite confident about not having to pay any winner, SIAM News decided that to really test the problem we should help to give the challenge wider circulation. The challenger is Alan Karp, IBM Scientific Research Center 1530 Page Mill Road, Palo Alto, CA 94304 --- I attended in 1985 the Second SIAM Conference on Parallel Processing for Scientific Computing in Norfolk, Virginia, where I heard about 1,000 processor systems, 4,000 processor systems, and even a proposed 1,000,000 processor system. Since I wonder if such systems are the best way to do general-purpose, scientific computing, I am making the following offer: I will pay $100 to the first person to demonstrate a speed-up of at least 200 on a general purpose, MIMD computer used for scientific computing. This offer will be withdrawn at 11:59 PM on 31 December 1995. Some definitions are in order. SPEED-UP: By speed-up I mean the time taken to run an application on one processor using the best sequential algorithm divided by the time to run the same application on N processors using the best parallel algorithm. GENERAL PURPOSE: The parallel system should be able to run a wide variety of applications. For the purposes of this test, the machine must run three different programs that demonstrate its ability to solve different problems. I suggest a large, dynamic structures calculation, a trans-sonic fluid flow past a complex barrier, and a large problem in econometric modeling. These are only suggestions; I will be quite flexible in the choice of the problems. Several people have volunteered to adjudicate disputes in the selection of the applications. APPLICATIONS: The problems run for this test must be complete applications--no computational kernels are allowed. They must contain all input, data-transfers from host to parallel processors, and all output. The problems chosen should be the kind of job that a working scientist or engineer would submit as a batch job to a large supercomputer. In addition, I am arbitrarily disqualifying all problems that Cleve Moler calls ``embarrassingly parallel.'' These include signal processing with multiple independent data streams, the computation of the Mandelbrot set, etc. There are some rather obvious ground rules. 1. Simulations of the hardware are not permitted. I am looking for a demonstration on a running piece of hardware. 2. The same problem should be run on the sequential and parallel processors. 3. It is not fair to cripple the sequential processor. For example, if your operating system uses 99% of one processor, the single processor run will spend all its time in the operating system. Super-linear speed-up as the number of processors is increased is evidence of this problem. 4. It is not fair to memory starve the single processor run. If you have 100K words of memory on each of 1,000 processors, and you run a 10 MW problem, it is not fair for the sequential run to be made on a 100 KW processor. After all, we are not interested in seeing the impact of additional memory; we want to see how much the extra CPUs help. It may not be possible to follow all these additional rules. For example, you can't be expected to build a single processor with lots of extra memory or write a new operating system just for this test. However, you should do your best to make the demonstration fair. A third party has agreed to adjudicate any scaling disputes. Anyone claiming this prize will be expected to present his or her results at a suitable conference. At the end of the presentation, I will have the audience vote on whether the demonstration was successful. Remember, the purpose of the bet is to force developers to demonstrate the utility of massively parallel systems, not to show they can find holes in the rules. Alan Karp is on the staff of IBM's Scientific Center in Palo Alto. His challenge is a personal one, and does not reflect the position of his employer. ---- While I can think of easier ways to earn a cool hundred, the challenge seems to involve all sorts of people, so I thought I would pass it on to you. Well, Computer Scientists ... (hands on hips, looking down, frowning, tapping foot) ... why HAVEN'T we gotten the hundred bucks yet? _D_a_n_ _G_r_e_e_n_i_n_g _A_R_P_A- dgreen@CS.UCLA.EDU _U_U_C_P- ..!{sdcrdcf,ihnp4,trwspp,ucbvax}!ucla-cs!dgreen
bs@linus.UUCP (Robert D. Silverman) (12/06/86)
At the Crypto '86 conference I presented a new variation of the Quadratic Sieve algorithm for factoring large integers. The variation has been programmed on a STAR configuation of SUN-3's using ethernet connections. I can demonstrate that N machines run N times as fast (less perhaps 1-2%) as 1 machine. The algorithm requires miniscule I/O between the central host and its satellites and it partitions itself in a natural way such that each satellite can work totally independently of the others. I don't have 200 SUN-3's to hook together, however. I have submitted a paper describing the work to the Journal of Supercomputing. Bob Silverman
ian@loral.UUCP (12/10/86)
In article <144@linus.UUCP> bs@linus.UUCP (Robert D. Silverman) writes: > > >At the Crypto '86 conference I presented a new variation of the >Quadratic Sieve algorithm for factoring large integers. The variation >has been programmed on a STAR configuation of SUN-3's using ethernet >connections. Although this many be an interesting parallel algorithm, it is far from meeting Karp's challenge. The point of Karp's challenge is that you must have a _general_ purpose parallel programming system - at least general purpose in the domain of numeric computation. The programs that Dr. Karp proposes are much more complex than the program mentioned above. I have written parallel sorts and matrix multiplies for the Loral dataflow processor, but these are toy programs compared to what Karp proposes. One of the problems with Karp's challenge (as others have pointed out) is that it will not be met unless someone foots the bill for a rather prodigious software development effort. For parallel machines of the scale that Karp's challenge covers (e.g., 100 or more processing elements) software development is harder than it is for sequential processors. For example, it took me a week to get a parallel matrix multiply program written and debugged on the Loral dataflow system. On a sequential processor this would be a trivial task. I believe that better programming environments will emerge, but they are a few years away. Karp's challenge will be met, if it is met at all, as a side effect of another project. Ian Kaplan Loral Dataflow Group Loral Instrumentation USENET: {ucbvax,decvax,ihnp4}!sdcsvax!loral!ian ARPA: sdcc6!loral!ian@UCSD USPS: 8401 Aero Dr. San Diego, CA 92123
segall@caip.UUCP (12/10/86)
> >At the Crypto '86 conference I presented a new variation of the > >Quadratic Sieve algorithm for factoring large integers. The variation > >has been programmed on a STAR configuation of SUN-3's using ethernet > >connections. > Although this many be an interesting parallel algorithm, it is far > from meeting Karp's challenge. The point of Karp's challenge is that > you must have a _general_ purpose parallel programming system - at least > general purpose in the domain of numeric computation. The programs that > Dr. Karp proposes are much more complex than the program mentioned above. Another point is that this doesn't even fit into the class of problems which Karp has allowed. He only wants to consider solutions to problems which are not "trivially parallelizable". Being trivially parallelizable does not mean that a problem is trivial, but that there is a way to execute a parallel solution to it which is no more involved than simply using a bunch of processors independantly, or almost independantly. Your solution seems to fit into that category. I would think that any problem of useful size whose communication can be handled without significant penalty by a single ethernet connecting some Suns together has trivial communication requirements. I agree with Ian that any system which meets this challenge will be part of a large project, and will only incidentally be entered in the contest. However, it also serves as a little bit of inspiration to those of us who hope to eventually meet the challenge.
rb@cci632.UUCP (Rex Ballard) (12/12/86)
The Karp challenge, at first looks like a very simple challenge to beat. But after looking at the rules more carefully, I could say it's a "rigged game". There are two analogies here. In the first analogy, the most popular, you are building a house. If it takes 1 man 1 year to build the house, but it takes 2 men 6 months to build the house, then it should take 365 men 1 day to build the house, and 3650 men, little more than an hour to build the house. Karp has generously not limited the number of processors required to get his 100 fold increase, but I can referr to the "mythical man-month" for the problems with such increases. In the other anology, we have an assembly line. In this case, it does become possible to organize a group more efficiently, and productivity does go up. Because all of the workers are able to do work at the same time, and time is saved searching for tools, it is "theoretically possible" for the line to build a car that takes 1 man-year to build in an average of 1 day. This is because many portions of the pipeline are also parallel, for example, the four wheels can be mounted at the same time. Karps test however, limits the number of tools available, and then asks for the actual time required to build 1 car. While it might still be possible to build the car in a month, because only one "cylinder maker" is allowed,..., even with 365 men, the increase will not approach 100-fold. Our company, like several others, does have distributed processing networks incorporating well over 1000 processors, and yet the speed-up factor for a single job, running through the network, is not significantly faster when compared to the actual execution time (less OS overhead) of a single processor system. The important difference is that the system can handle 1000 jobs before serious degradation over single job performance occurrs. If one carefully reads the rules, they will find that Karps test is quite simply a "rigged game".
eugene@pioneer.arpa (Eugene Miya N.) (12/13/86)
Just a short note. I am taking the liberty to forward all the postings about the Karp Challenge to Alan Karp (karp@ibm.com). If you care to read about the challenge, send mail to netlib@anl-mcs.arpa (I assume you know how to use netlib) using the keywords benchmark and karp-challenge. From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene