cik@hubcap.UUCP (10/30/87)
[ I saw this on comp.arch and thought it might be of interest. Steve ] Simulation is frequently wanted for large problems, for which vector or parallel processors are frequently used. Most of the computationally efficient, or even moderately efficient, methods of generating non- uniform random numbers are acceptance-rejection methods. In these, a processor either accepts, in which case a result is produced, or rejects, in which case either nothing is produced, or further processing is needed, which may merely be applying a procedure not using the original random input. If it were possible for the computer to only keep the items where acceptance occurs, or to copy items from the start of a vector into the "holes" (i.e., if 23 processors rejected then the first 23 items of a vector would be used as the results for those processors), full parallel speed could be maintained. As it is otherwise, the degradation increases with the number of processors. This can easily be done on the vector machine CYBER205, but cannot be vectorized on the CRAY-1, and only with difficulty on some other CRAYs. I do not know if hardware for parallel processors is capable of this, but I put it forth as a good idea. A different situation calling for a more restrictive use of the same ideas, insisting that both of the above operations can be done and requiring the order be maintained (which the CYBER does), occurs if a function requires different subroutines at different parts of its domain. Here, in general, the length of time for a parallel processor will be the sum of the evaluation times for each of the subroutines. This also would be reduced. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet