[comp.hypercube] communications intensive applications

Ray@hubcap.UUCP (07/16/87)

In article <270@hubcap.UUCP> fpst@hubcap.UUCP (Dennis Stevenson) writes:
>
>One such 'application' that may be  interesting was  described in the
>April 1987 edition of IEEE Computer.   "Multiprocessing  the Sieve of
>Eratosthenes"  by  S.H.Bokhari.   This  uses a  FLEX-32 shared memory
>machine to implement a novel prime number sieve benchmark.
>

We have implemented the "Seive of Erathosthenes" on the Intel Hypercube as
follows:

The tablet of N numbers is divided evenly amongst the p processors (p can
be any value up to the maximum number of processors in the configuration).
Processor 0 contains the first N/p numbers, 1 contains the next, etc.

Processor zero is in charge and will have all of the SQRT(N) first primes.
Zero finds the prime and before sieving broadcasts the value to all
participating processors. Each processor (including zero) then strikes out
those multiples of the prime which belong to him. Once zero has finished
striking, he finds the next prime and broadcasts it, etc. Each processor
has an equivalent number of numbers and they all seive with the same prime
at the same time. The work is balanced.

Once zero has found the first SQRT(N) primes, seiving is finished and each
processor can search through his subset of numbers and report the primes,
again a perfectly balanced computation.

This code has been implemented in FORTRAN and runs on an iPSC */MX (* = 4,5
or 6).

-- 

Ray Asbury				CSNET: ray@isc.intel.com
Intel Scientific Computers
15201 NW Greenbrier Parkway
Beaverton, Oregon 97006
(503) 629-7641

gwu@vlsi.cs.cmu.edu (George Wu) (07/24/87)

     The problem with Ray's algorithm is that every processor is going to
receive N broadcasts (somewhat fewer, since some numbers are already crossed
out) to cross out the multiples of that number. And it will have to examine
N/p numbers to determine if it's a multiple. And the first number found to
be a multiple will have to be obtained by doing at least one division/modulo.

     How about this, starting at the first element of the list, a processor
traverses the list, crossing out all multiples of the first number. As the
list is processed, all numbers not crossed out are broadcast to the next
processor in the "pipeline", to form it's list. The next processor repeats
this procedure on the smaller list, and so on.

     Granted, since the last processor receives a smaller list, it spends
much of it's time idling. The total time is N plus the time it takes for
the last number to propogate through all the processors. If N is greater
than p, then things slow down somewhat. The pth processor must wait until
processor zero is done, before sending it's list. I guess this makes the
execution time N^2/p.

     To give credit where it's due, I didn't think this algorithm up. A
friend and coworker, Joe Golton, did.

						George (and Joe)
[  This analysis brings up a question about "good."  In the third
generaton mentality, "idle" == "bad."  Is that still true?  Or are there
better measures?
		Steve Stevenson - the moderator
]