cik@l.cc.purdue.edu (Herman Rubin) (01/02/88)
I would like to clarify my claim that acceptance-rejection methods for generating random variables may be more suited for some vector machines than for parallel machines. On a vector machine, after checking that the input random vectors U and T (usually U will be uniform, preferably not in floating point, and T will be exponential) are sufficiently long, the general idea of the procedure is X = F(U) D = G(U) T = H(T,D) If T[i] satisfies an appropriate condition, accept X[i] and keep T[i]; otherwise replace X[i] with something else (or possibly discard the value of X[i]) and discard the value of T[i]. There are more sophisticated possibilities, but this gives the general idea. Since we are dealing with random input, it is impossible to predict the length of the resulting vectors. Also this algorithm is vectorized; with a sequential processor it would be better to use a T value until a rejection occurs. There is a considerable difference between vector processors. On the CYBER 205, the operations described use only simple machine instructions. On the CRAY 1, things are quite difficult to vectorize, and since the calculation of F and G may involve table lookup, scatter-gather procedures must be availabe in many cases. On the CRAYs which have scatter-gather, if the procedure is highly efficient (small rejection probability) things are not too bad. Rejection probabilities may be small (<.01) or somewhat larger, say between .1 and .4. On a parallel processor, one can either make sure that each of the separate computing units produces exactly the same number of items or merge the various output vectors. We also have to put T back, eliding the discarded elements. Also, one or more of the units may find itself in a position in which it needs to call a refill subroutine. This may lead to unsuspected failures of reproducibility; this is not _necessarily_ bad. It can also cause the various processors to take greatly different times, which is likely to be bad. There is a _possible_ way out if some good vector instructions are added to parallel processors. What is needed are the vector test, merge, and compress instructions such as are found on the CYBER 205. (An article in which I raised the possibility of this was previously posted.) If these could be added to parallel processors, possibly as an independent computing unit, provided that the slightly less efficient vector version is used, even SIMD machines might be able to do a reasonable job in these situations. Another situation in which similar problems might arise is that of evaluating a function where different algorithms are required in different parts of the domain. Breakup, evaluation of each set by its appropriate routine, and merge is certainly the best way _if it can be done_. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet