oliver@athena.mit.edu (James D. Oliver III) (11/09/90)
I have a number (order of magnitude 10**6) of random number function values to generate as part of a Monte Carlo simulation using Cray Fortran. The actual variables of interest are independent of the random numbers, so I can use the same series for each simulation. My question is, is it faster in general to regenerate the random function values every time (this part of the code can't be vectorized), or to just read the values in from a file? Obviously, the solution is to run benchmark programs, which I intend to do, but just as a rule of thumb, is I/O considerably slower than calling functions? Also, could someone repost that pointer to a supercomputer optimization/vectorization reference that was made earlier? -- ____________________________ Jim Oliver oliver@athena.mit.edu / joliver@hstbme.mit.edu oliver%mitwccf.BITNET@MITVMA.MIT.EDU
v087mxgb@ubvmsa.cc.buffalo.edu (Shawn E Thompson) (11/09/90)
I posted an excellent generator previously (2^144 numbers) vectorized in an external function...Email me if you are interested
patrick@convex.COM (Patrick F. McGehearty) (11/09/90)
In article <1990Nov8.170334.24579@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes: >I have a number (order of magnitude 10**6) of random number function values >to generate ... > My question is, is it faster in general to regenerate the random function > values every time (this part of the code can't be vectorized), or to just > read the values in from a file? It depends on how you read the numbers. Using unformatted, direct I/O with large buffers into an array can be quite efficient. Reading each number with a separate READ from an ascii file (thus requiring a string to real conversion) will be quite slow. Whether the unformatted read is as fast as a random number function is implementation dependent and should be easy to benchmark.
oliver@athena.mit.edu (James D. Oliver III) (11/09/90)
I guess I need to make this clear: the random number genertor itself is vectorizable, however, the values of interest to me are *functions* of these randome variables. To be more specific, I'm generating a series of randomly oriented 3-D unit vectors. The fastest way to do this is to generate two random numbers for x and y and calculate z as long as x**2 + y**2 <= 1. This process, however, involves a conditional loop back to the ranf() function and thus is not vectorizable. However, the fact that this method is not vectorizable is just of incidental interest, since it remains the fastest way of doing it. What I'd like to know is, *given* this way of generating my values, should I expect it to be faster than reading the values from a file? Like I said, for this particular case I can run a benchmark, but I was wondering if there was a general rule of thumb for the speed of I/O operations. -- ____________________________ Jim Oliver oliver@athena.mit.edu / joliver@hstbme.mit.edu oliver%mitwccf.BITNET@MITVMA.MIT.EDU
chidsey@smoke.brl.mil (Irving Chidsey) (11/09/90)
In article <1990Nov8.215752.7075@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes:
<I guess I need to make this clear: the random number genertor itself is
<vectorizable, however, the values of interest to me are *functions* of
<these randome variables. To be more specific, I'm generating a series of
<randomly oriented 3-D unit vectors. The fastest way to do this is to
<generate two random numbers for x and y and calculate z as long as x**2 +
<y**2 <= 1. This process, however, involves a conditional loop back to the
<ranf() function and thus is not vectorizable. However, the fact that this
<method is not vectorizable is just of incidental interest, since it remains
<the fastest way of doing it.
<
<What I'd like to know is, *given* this way of generating my values, should
<I expect it to be faster than reading the values from a file? Like I said,
<for this particular case I can run a benchmark, but I was wondering if
<there was a general rule of thumb for the speed of I/O operations.
<
< Jim Oliver
< oliver@athena.mit.edu / joliver@hstbme.mit.edu
< oliver%mitwccf.BITNET@MITVMA.MIT.EDU
The general rule is 'Keeping the random numbers in a file is faster,
but takes more space; computing them on the fly is compact but time consuming;
it doesn't matter if you don't need many.'
Why not code it so that if ranfilnam is ' ' you compute, if it
contains a name you read from the file; then you can choose one or the other
depending on conditions when you submit the job.
Irv
--
I do not have signature authority. I am not authorized to sign anything.
I am not authorized to commit the BRL, the DOA, the DOD, or the US Government
to anything, not even by implication.
Irving L. Chidsey <chidsey@brl.mil>
efrank@truth.hep.upenn.edu (Ed Frank) (11/09/90)
In article <1990Nov8.215752.7075@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes: >I guess I need to make this clear: the random number genertor itself is >vectorizable, however, the values of interest to me are *functions* of >these randome variables. To be more specific, I'm generating a series of >randomly oriented 3-D unit vectors. The fastest way to do this is to >generate two random numbers for x and y and calculate z as long as x**2 + >y**2 <= 1. This process, however, involves a conditional loop back to the >ranf() function and thus is not vectorizable. However, the fact that this >method is not vectorizable is just of incidental interest, since it remains >the fastest way of doing it. >____________________________ > Jim Oliver > oliver@athena.mit.edu / joliver@hstbme.mit.edu > oliver%mitwccf.BITNET@MITVMA.MIT.EDU First, it is not clear to me that this is the fastest method. My method of generating random direction cosines is as follows. Let DIR() be the array of direction cosines, and let RANN() be the (uniform) random number generator. PI is 3.14. Then Phi = 2.0 * PI * RANN( dum) ! in polar coord. Phi and cos(theta) Dir(3) = -1.0 + 2.0*RANN( dum) ! follow a flat distribution. Sth = SQRT( 1.0 = Dir(3)**2) ! convert cos(theta) to sin(theta) Dir(1) = Sth * COS( phi) ! now get X and y Dir(2) = Sth * SIN( phi) ! direction cosines will produce the random directions for you. You might want to include this in your benchmark. I think that this will beat your method with respect to speed. Second, I do not completely understand your method. If I am reading your description correctly, you will *NOT* obtain a uniform distribution with your technique. Have you histogrammed the direction cosines produced by your method? The distributions should be flat. Keep in mind that if you distribute points on the surface of a sphere uniformly, and then project that distribution onto a plane, you will not obtain a flat distrubution bounded by x**2+y**2 = 1. Most of the sphere's projected area will be near x**2 + y**2 = 1. In short: Phi is flat from 0 to 2*pi Cos(theta) is flat from -1 to 1 Ed Frank efrank@upenn5.hep.upenn.edu
dodson@convex.COM (Dave Dodson) (11/10/90)
In article <1990Nov8.215752.7075@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes: >To be more specific, I'm generating a series of >randomly oriented 3-D unit vectors. The fastest way to do this is to >generate two random numbers for x and y and calculate z as long as x**2 + >y**2 <= 1. This process, however, involves a conditional loop back to the >ranf() function and thus is not vectorizable. I'd generate an array of candidates and filter out the inadmissable ones. This could all be vectorized with a SCILIB routine as follows: real x(n),y(n),z(n),r(n) integer indx(n) do 10 i = 1, n x(i) = 2.0 * ranf() - 1.0 y(i) = 2.0 * ranf() - 1.0 z(i) = 2.0 * ranf() - 1.0 r(i) = x(i)**2 + y(i)**2 + z(i)**2 10 continue call whenflt (n,r,1,1.0,indx,nindx) cdir$ appropriate compiler directive to ignore dependencies in the loop. do 20 i = 1, nindx j = indx(i) t = 1.0 / sqrt(r(j)) x(i) = t * x(j) y(i) = t * y(j) z(i) = t * z(j) 20 continue You are using one of our competitors' machines, so I haven't tried this, but I believe it will all vectorize on a Cray, as it does on a Convex. It returns nindx random 3-d unit vectors in (x(i),y(i),z(i),i=1,nindx). P.S. I don't think you can calculate z from x and y and get a truly uniform distribution of unit vectors. You have to generate x, y, and z independently and reject those outside the unit sphere to get uniformity. Alternatively, generate x, y, and z from a Normal distribution and just normalize to unit length, but, of course, it is much harder to get Normal numbers than uniform. ---------------------------------------------------------------------- Dave Dodson dodson@convex.COM Convex Computer Corporation Richardson, Texas (214) 497-4234
oliver@athena.mit.edu (James D. Oliver III) (11/12/90)
In article <32565@netnews.upenn.edu> efrank@truth.hep.upenn.edu (Ed Frank) writes: > First, it is not clear to me that this is the fastest method. My >method of generating random direction cosines is as follows. Let DIR() >be the array of direction cosines, and let RANN() be the (uniform) random >number generator. PI is 3.14. Then > Phi = 2.0 * PI * RANN( dum) ! in polar coord. Phi and cos(theta) > Dir(3) = -1.0 + 2.0*RANN( dum) ! follow a flat distribution. > Sth = SQRT( 1.0 = Dir(3)**2) ! convert cos(theta) to sin(theta) > Dir(1) = Sth * COS( phi) ! now get X and y > Dir(2) = Sth * SIN( phi) ! direction cosines >will produce the random directions for you. You might want to include >this in your benchmark. I think that this will beat your method with >respect to speed. > Second, I do not completely understand your method. If I am reading >your description correctly, you will *NOT* obtain a uniform >distribution with your technique. Have you histogrammed the direction >cosines produced by your method? The distributions should be flat. >Keep in mind that if you distribute points on the surface of a sphere >uniformly, and then project that distribution onto a plane, you will >not obtain a flat distrubution bounded by x**2+y**2 = 1. Most of the >sphere's projected area will be near x**2 + y**2 = 1. >In short: Phi is flat from 0 to 2*pi > Cos(theta) is flat from -1 to 1 Well, I hadn't originally intended to go into this, but your post and several e-mail messages I've recieved indicate that the method of generating the uniform vectors needs to be explained a little better than my admittedly sketchy description. Looking back, I realized that I gave the impression that I directly substituted the values of my two random numbers as x and y, which of course does not give the propert distribution. The method I used was suggested by Marsaglia (Ann. Math. Stat., 43:645-646, 1972). I stumbled onto it entirely by accident, and it seems to be the most elegant method out there. Briefly, it is as follows: 1) Generate two random uniform variates u and v on (-1,1). If S = u**2 + v**2 > 1, then generate new values for u and v. 2) Calculate Q = sqrt(1-S), then x=2.*u*Q y=2.*v*Q z=1.-2.*S The efficiency of step 1) goes as the area of a unit square divided by the area of a unit circle, or 4/pi, so the average number of random numbers generated per success is 8/pi, or 2.55. Thus the method requires 2.55 random function calls plus a square root to produce a vector, as opposed to your method, which requires 2 random function calls, one square root, and two calls to trigonometric functions. Marsaglia provides the proof for this method, which basically results from the fact that the distribution of S is a uniform variate on (0,1) and independent of (u,v). This method is significantly faster than any method I've used requiring trig functions. Although I haven't tried the one you mention above, I'm failry confident that it is faster than that one as well. -- ____________________________ Jim Oliver oliver@athena.mit.edu / joliver@hstbme.mit.edu oliver%mitwccf.BITNET@MITVMA.MIT.EDU
tbb@sumac02.cray.com (Todd Blachowiak) (11/13/90)
In article <1990Nov8.215752.7075@athena.mit.edu>, oliver@athena.mit.edu (James D. Oliver III) writes: |> I guess I need to make this clear: the random number genertor itself is |> vectorizable, however, the values of interest to me are *functions* of |> these randome variables. To be more specific, I'm generating a series of |> randomly oriented 3-D unit vectors. The fastest way to do this is to |> generate two random numbers for x and y and calculate z as long as x**2 + |> y**2 <= 1. This process, however, involves a conditional loop back to the |> ranf() function and thus is not vectorizable. However, the fact that this |> method is not vectorizable is just of incidental interest, since it remains |> the fastest way of doing it. would something like this work for you. x(i) = ranf() y(i) = (1-x(i)*x(i))*ranf() z(i) = 1-x(i)*x(i)-y(i)*y(i) I believe this would vectorize for you and it gets rid of your conditional. Also, you do not have to make any wasted calls to the random number generator. |> |> What I'd like to know is, *given* this way of generating my values, should |> I expect it to be faster than reading the values from a file? Like I said, |> for this particular case I can run a benchmark, but I was wondering if |> there was a general rule of thumb for the speed of I/O operations. |> |> |> -- |> ____________________________ |> Jim Oliver |> oliver@athena.mit.edu / joliver@hstbme.mit.edu |> oliver%mitwccf.BITNET@MITVMA.MIT.EDU -- Todd Blachowiak tbb@cray.com
oliver@athena.mit.edu (James D. Oliver III) (11/13/90)
In article <108474@convex.convex.com>, dodson@convex.COM (Dave Dodson) writes: > real x(n),y(n),z(n),r(n) > integer indx(n) > do 10 i = 1, n > x(i) = 2.0 * ranf() - 1.0 > y(i) = 2.0 * ranf() - 1.0 > z(i) = 2.0 * ranf() - 1.0 > r(i) = x(i)**2 + y(i)**2 + z(i)**2 > 10 continue > call whenflt (n,r,1,1.0,indx,nindx) >cdir$ appropriate compiler directive to ignore dependencies in the loop. > do 20 i = 1, nindx > j = indx(i) > t = 1.0 / sqrt(r(j)) > x(i) = t * x(j) > y(i) = t * y(j) > z(i) = t * z(j) > 20 continue Your algorithm is the same as the original scheme proposed by von Neumann for generating random vectors. It's efficiency is given by the area of a unit cube divided by a unit sphere, so 18/pi=5.73 random numbers plus a square root must be generated for each vector, as opposed to the 2.55 plus a square root for the Marsaglia method I posted in a recent follow-up >You are using one of our competitors' machines, so I haven't tried this, >but I believe it will all vectorize on a Cray, as it does on a Convex. Hey, offer me a price, my code knows no loyalty! :-) >P.S. I don't think you can calculate z from x and y and get a truly uniform >distribution of unit vectors. You have to generate x, y, and z independently >and reject those outside the unit sphere to get uniformity. As I said, Marsaglia devised a clever way around this; see a more recent posting. -- ____________________________ Jim Oliver oliver@athena.mit.edu / joliver@hstbme.mit.edu oliver%mitwccf.BITNET@MITVMA.MIT.EDU
dodson@convex.COM (Dave Dodson) (11/14/90)
In article <1990Nov13.053812.19914@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes: >Your algorithm is the same as the original scheme proposed by von Neumann >for generating random vectors. It's efficiency is given by the area of a >unit cube divided by a unit sphere, so 18/pi=5.73 random numbers plus a >square root must be generated for each vector, as opposed to the 2.55 plus >a square root for the Marsaglia method I posted in a recent follow-up The intended contribution in my posting was how to vectorize the generation of random unit vectors. I suggest you combine Marsaglia's rejection algorithm with my use of whenfle to produce a bunch of unit vectors in vector mode. With the entire process vectorized, I would expect it to be much faster to generate them than to input them. ---------------------------------------------------------------------- Dave Dodson dodson@convex.COM Convex Computer Corporation Richardson, Texas (214) 497-4234