[comp.hypercube] Something for parallel processing

cik@hubcap.UUCP (10/30/87)

[ I saw this on comp.arch and thought it might be of interest.
	Steve ]

Simulation is frequently wanted for large problems, for which vector or
parallel processors are frequently used.  Most of the computationally
efficient, or even moderately efficient, methods of generating non-
uniform random numbers are acceptance-rejection methods.  In these, a
processor either accepts, in which case a result is produced, or rejects,
in which case either nothing is produced, or further processing is needed,
which may merely be applying a procedure not using the original random input.

If it were possible for the computer to only keep the items where acceptance
occurs, or to copy items from the start of a vector into the "holes" (i.e., if
23 processors rejected then the first 23 items of a vector would be used as the
results for those processors), full parallel speed could be maintained.  As it
is otherwise, the degradation increases with the number of processors.  This
can easily be done on the vector machine CYBER205, but cannot be vectorized on
the CRAY-1, and only with difficulty on some other CRAYs.

I do not know if hardware for parallel processors is capable of this, but I put
it forth as a good idea.  A different situation calling for a more restrictive
use of the same ideas, insisting that both of the above operations can be done
and requiring the order be maintained (which the CYBER does), occurs if a
function requires different subroutines at different parts of its domain.
Here, in general, the length of time for a parallel processor will be the
sum of the evaluation times for each of the subroutines.  This also would
be reduced.


-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet