[comp.hypercube] Sieve of Erastosthenes

ray@intelisc.UUCP (Ray Asbury) (07/29/87)

In article <332@hubcap.UUCP> gwu@vlsi.cs.cmu.edu (George Wu) writes:
>
>     The problem with Ray's algorithm is that every processor is going to
>receive N broadcasts (somewhat fewer, since some numbers are already crossed
>out) to cross out the multiples of that number. And it will have to examine
>N/p numbers to determine if it's a multiple. And the first number found to
>be a multiple will have to be obtained by doing at least one division/modulo.
>

In actuality, only the first primes found with values less than or equal to
sqrt(N) are globally sent. It also turns out that all these values are
found on node 0. (sqrt(N) << N/p) Communications in this example are neglegible.

	while (prime**2 .lt. N)
	if (node 0) then
		find next prime
		global send
		sieve
	else
		receive prime
		sieve
	endif
	endwhile

/* Sieving finished */

	for all nodes
		find and count primes
	end

The code is written in FORTRAN and uses slow subroutine calls to do the bit
manipulation. To sieve and count the primes between 2 and 50 million takes
about ten minutes on a 16 node system. The sieving part and the counting part
take about the same amount of time, 5 minutes apiece. The communication
represents 1 to 2 seconds of this total time.

>     Granted, since the last processor receives a smaller list, it spends
>much of it's time idling. The total time is N plus the time it takes for
>the last number to propogate through all the processors. If N is greater
>than p, then things slow down somewhat. The pth processor must wait until
>processor zero is done, before sending it's list. I guess this makes the
>execution time N^2/p.
>

The algorithm I've presented is perfectly balanced. Each processor does
exactly the same amount of work and they do it all simultaneously.
-- 

Ray Asbury				CSNET: ray@isc.intel.com
Intel Scientific Computers
15201 NW Greenbrier Parkway
Beaverton, Oregon 97006
(503) 629-7641