ray@intelisc.UUCP (Ray Asbury) (07/29/87)
In article <332@hubcap.UUCP> gwu@vlsi.cs.cmu.edu (George Wu) writes: > > The problem with Ray's algorithm is that every processor is going to >receive N broadcasts (somewhat fewer, since some numbers are already crossed >out) to cross out the multiples of that number. And it will have to examine >N/p numbers to determine if it's a multiple. And the first number found to >be a multiple will have to be obtained by doing at least one division/modulo. > In actuality, only the first primes found with values less than or equal to sqrt(N) are globally sent. It also turns out that all these values are found on node 0. (sqrt(N) << N/p) Communications in this example are neglegible. while (prime**2 .lt. N) if (node 0) then find next prime global send sieve else receive prime sieve endif endwhile /* Sieving finished */ for all nodes find and count primes end The code is written in FORTRAN and uses slow subroutine calls to do the bit manipulation. To sieve and count the primes between 2 and 50 million takes about ten minutes on a 16 node system. The sieving part and the counting part take about the same amount of time, 5 minutes apiece. The communication represents 1 to 2 seconds of this total time. > Granted, since the last processor receives a smaller list, it spends >much of it's time idling. The total time is N plus the time it takes for >the last number to propogate through all the processors. If N is greater >than p, then things slow down somewhat. The pth processor must wait until >processor zero is done, before sending it's list. I guess this makes the >execution time N^2/p. > The algorithm I've presented is perfectly balanced. Each processor does exactly the same amount of work and they do it all simultaneously. -- Ray Asbury CSNET: ray@isc.intel.com Intel Scientific Computers 15201 NW Greenbrier Parkway Beaverton, Oregon 97006 (503) 629-7641