Ray@hubcap.UUCP (07/16/87)
In article <270@hubcap.UUCP> fpst@hubcap.UUCP (Dennis Stevenson) writes: > >One such 'application' that may be interesting was described in the >April 1987 edition of IEEE Computer. "Multiprocessing the Sieve of >Eratosthenes" by S.H.Bokhari. This uses a FLEX-32 shared memory >machine to implement a novel prime number sieve benchmark. > We have implemented the "Seive of Erathosthenes" on the Intel Hypercube as follows: The tablet of N numbers is divided evenly amongst the p processors (p can be any value up to the maximum number of processors in the configuration). Processor 0 contains the first N/p numbers, 1 contains the next, etc. Processor zero is in charge and will have all of the SQRT(N) first primes. Zero finds the prime and before sieving broadcasts the value to all participating processors. Each processor (including zero) then strikes out those multiples of the prime which belong to him. Once zero has finished striking, he finds the next prime and broadcasts it, etc. Each processor has an equivalent number of numbers and they all seive with the same prime at the same time. The work is balanced. Once zero has found the first SQRT(N) primes, seiving is finished and each processor can search through his subset of numbers and report the primes, again a perfectly balanced computation. This code has been implemented in FORTRAN and runs on an iPSC */MX (* = 4,5 or 6). -- Ray Asbury CSNET: ray@isc.intel.com Intel Scientific Computers 15201 NW Greenbrier Parkway Beaverton, Oregon 97006 (503) 629-7641
gwu@vlsi.cs.cmu.edu (George Wu) (07/24/87)
The problem with Ray's algorithm is that every processor is going to receive N broadcasts (somewhat fewer, since some numbers are already crossed out) to cross out the multiples of that number. And it will have to examine N/p numbers to determine if it's a multiple. And the first number found to be a multiple will have to be obtained by doing at least one division/modulo. How about this, starting at the first element of the list, a processor traverses the list, crossing out all multiples of the first number. As the list is processed, all numbers not crossed out are broadcast to the next processor in the "pipeline", to form it's list. The next processor repeats this procedure on the smaller list, and so on. Granted, since the last processor receives a smaller list, it spends much of it's time idling. The total time is N plus the time it takes for the last number to propogate through all the processors. If N is greater than p, then things slow down somewhat. The pth processor must wait until processor zero is done, before sending it's list. I guess this makes the execution time N^2/p. To give credit where it's due, I didn't think this algorithm up. A friend and coworker, Joe Golton, did. George (and Joe) [ This analysis brings up a question about "good." In the third generaton mentality, "idle" == "bad." Is that still true? Or are there better measures? Steve Stevenson - the moderator ]