[comp.lang.fortran] Random Number Functions--Generate versus Input

oliver@athena.mit.edu (James D. Oliver III) (11/09/90)

I have a number (order of magnitude 10**6) of random number function values
to generate as part of a Monte Carlo simulation using Cray Fortran.  The
actual variables of interest are independent of the random numbers, so I
can use the same series for each simulation.  My question is, is it faster
in general to regenerate the random function values every time (this part
of the code can't be vectorized), or to just read the values in from a
file?

Obviously, the solution is to run benchmark programs, which I intend to do,
but just as a rule of thumb, is I/O considerably slower than calling
functions?

Also, could someone repost that pointer to a supercomputer
optimization/vectorization reference that was made earlier?


--
____________________________
	Jim Oliver   
	oliver@athena.mit.edu /	joliver@hstbme.mit.edu
	oliver%mitwccf.BITNET@MITVMA.MIT.EDU

v087mxgb@ubvmsa.cc.buffalo.edu (Shawn E Thompson) (11/09/90)

I posted an excellent generator previously (2^144 numbers)
vectorized in an external function...Email me if you are
interested

patrick@convex.COM (Patrick F. McGehearty) (11/09/90)

In article <1990Nov8.170334.24579@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes:
>I have a number (order of magnitude 10**6) of random number function values
>to generate ...
>  My question is, is it faster in general to regenerate the random function
> values every time (this part of the code can't be vectorized), or to just
> read the values in from a file?
It depends on how you read the numbers.

Using unformatted, direct I/O with large buffers into an array can be quite
efficient.  Reading each number with a separate READ from an ascii file
(thus requiring a string to real conversion) will be quite slow.
Whether the unformatted read is as fast as a random number function is
implementation dependent and should be easy to benchmark.

oliver@athena.mit.edu (James D. Oliver III) (11/09/90)

I guess I need to make this clear: the random number genertor itself is
vectorizable, however, the values of interest to me are *functions* of
these randome variables.  To be more specific, I'm generating a series of
randomly oriented 3-D unit vectors.  The fastest way to do this is to
generate two random numbers for x and y and calculate z as long as x**2 +
y**2 <= 1.  This process, however, involves a conditional loop back to the
ranf() function and thus is not vectorizable.  However, the fact that this
method is not vectorizable is just of incidental interest, since it remains
the fastest way of doing it.

What I'd like to know is, *given* this way of generating my values, should
I expect it to be faster than reading the values from a file?  Like I said,
for this particular case I can run a benchmark, but I was wondering if
there was a general rule of thumb for the speed of I/O operations.


--
____________________________
	Jim Oliver   
	oliver@athena.mit.edu /	joliver@hstbme.mit.edu
	oliver%mitwccf.BITNET@MITVMA.MIT.EDU

chidsey@smoke.brl.mil (Irving Chidsey) (11/09/90)

In article <1990Nov8.215752.7075@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes:
<I guess I need to make this clear: the random number genertor itself is
<vectorizable, however, the values of interest to me are *functions* of
<these randome variables.  To be more specific, I'm generating a series of
<randomly oriented 3-D unit vectors.  The fastest way to do this is to
<generate two random numbers for x and y and calculate z as long as x**2 +
<y**2 <= 1.  This process, however, involves a conditional loop back to the
<ranf() function and thus is not vectorizable.  However, the fact that this
<method is not vectorizable is just of incidental interest, since it remains
<the fastest way of doing it.
<
<What I'd like to know is, *given* this way of generating my values, should
<I expect it to be faster than reading the values from a file?  Like I said,
<for this particular case I can run a benchmark, but I was wondering if
<there was a general rule of thumb for the speed of I/O operations.
<
<	Jim Oliver   
<	oliver@athena.mit.edu /	joliver@hstbme.mit.edu
<	oliver%mitwccf.BITNET@MITVMA.MIT.EDU

	The general rule is 'Keeping the random numbers in a file is faster,
but takes more space; computing them on the fly is compact but time consuming;
it doesn't matter if you don't need many.'

	Why not code it so that if ranfilnam  is ' ' you compute, if it
contains a name you read from the file; then you can choose one or the other
depending on conditions when you submit the job.

						Irv

-- 
I do not have signature authority.  I am not authorized to sign anything.
I am not authorized to commit the BRL, the DOA, the DOD, or the US Government
to anything, not even by implication.
			Irving L. Chidsey  <chidsey@brl.mil>

efrank@truth.hep.upenn.edu (Ed Frank) (11/09/90)

In article <1990Nov8.215752.7075@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes:
>I guess I need to make this clear: the random number genertor itself is
>vectorizable, however, the values of interest to me are *functions* of
>these randome variables.  To be more specific, I'm generating a series of
>randomly oriented 3-D unit vectors.  The fastest way to do this is to
>generate two random numbers for x and y and calculate z as long as x**2 +
>y**2 <= 1.  This process, however, involves a conditional loop back to the
>ranf() function and thus is not vectorizable.  However, the fact that this
>method is not vectorizable is just of incidental interest, since it remains
>the fastest way of doing it.
>____________________________
>	Jim Oliver   
>	oliver@athena.mit.edu /	joliver@hstbme.mit.edu
>	oliver%mitwccf.BITNET@MITVMA.MIT.EDU

  First, it is not clear to me that this is the fastest method.  My 
method of generating random direction cosines is as follows.  Let DIR()
be the array of direction cosines, and let RANN() be the (uniform) random
number generator.  PI is 3.14.  Then
        Phi = 2.0 * PI * RANN( dum)       ! in polar coord. Phi and cos(theta)
        Dir(3) = -1.0 + 2.0*RANN( dum)    ! follow a flat distribution.
        Sth    = SQRT( 1.0 = Dir(3)**2)   ! convert cos(theta) to sin(theta)
        Dir(1) = Sth * COS( phi)          ! now get X and y
        Dir(2) = Sth * SIN( phi)          ! direction cosines
will produce the random directions for you.  You might want to include
this in your benchmark.  I think that this will beat your method with
respect to speed.

   Second, I do not completely understand your method. If I am reading
your description correctly, you will *NOT* obtain a uniform
distribution with your technique.  Have you histogrammed the direction
cosines produced by your method? The distributions should be flat.
Keep in mind that if you distribute points on the surface of a sphere
uniformly, and then project that distribution onto a plane, you will
not obtain a flat distrubution bounded by x**2+y**2 = 1.  Most of the
sphere's projected area will be near x**2 + y**2 = 1.

In short: Phi is flat from 0 to 2*pi
          Cos(theta) is flat from -1 to 1

Ed Frank
efrank@upenn5.hep.upenn.edu

dodson@convex.COM (Dave Dodson) (11/10/90)

In article <1990Nov8.215752.7075@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes:
>To be more specific, I'm generating a series of
>randomly oriented 3-D unit vectors.  The fastest way to do this is to
>generate two random numbers for x and y and calculate z as long as x**2 +
>y**2 <= 1.  This process, however, involves a conditional loop back to the
>ranf() function and thus is not vectorizable.

I'd generate an array of candidates and filter out the inadmissable ones.
This could all be vectorized with a SCILIB routine as follows:

	real x(n),y(n),z(n),r(n)
	integer indx(n)
	do 10 i = 1, n
	    x(i) = 2.0 * ranf() - 1.0
	    y(i) = 2.0 * ranf() - 1.0
	    z(i) = 2.0 * ranf() - 1.0
	    r(i) = x(i)**2 + y(i)**2 + z(i)**2
   10	continue
	call whenflt (n,r,1,1.0,indx,nindx)
cdir$	appropriate compiler directive to ignore dependencies in the loop.
	do 20 i = 1, nindx
	    j = indx(i)
	    t = 1.0 / sqrt(r(j))
	    x(i) = t * x(j)
	    y(i) = t * y(j)
	    z(i) = t * z(j)
   20	continue

You are using one of our competitors' machines, so I haven't tried this,
but I believe it will all vectorize on a Cray, as it does on a Convex.  It
returns nindx random 3-d unit vectors in (x(i),y(i),z(i),i=1,nindx).

P.S.  I don't think you can calculate z from x and y and get a truly uniform
distribution of unit vectors.  You have to generate x, y, and z independently
and reject those outside the unit sphere to get uniformity.  Alternatively,
generate x, y, and z from a Normal distribution and just normalize to unit
length, but, of course, it is much harder to get Normal numbers than uniform.

----------------------------------------------------------------------

Dave Dodson		                             dodson@convex.COM
Convex Computer Corporation      Richardson, Texas      (214) 497-4234

oliver@athena.mit.edu (James D. Oliver III) (11/12/90)

In article <32565@netnews.upenn.edu> efrank@truth.hep.upenn.edu (Ed Frank) writes:

>  First, it is not clear to me that this is the fastest method.  My 
>method of generating random direction cosines is as follows.  Let DIR()
>be the array of direction cosines, and let RANN() be the (uniform) random
>number generator.  PI is 3.14.  Then
>        Phi = 2.0 * PI * RANN( dum)       ! in polar coord. Phi and cos(theta)
>        Dir(3) = -1.0 + 2.0*RANN( dum)    ! follow a flat distribution.
>        Sth    = SQRT( 1.0 = Dir(3)**2)   ! convert cos(theta) to sin(theta)
>        Dir(1) = Sth * COS( phi)          ! now get X and y
>        Dir(2) = Sth * SIN( phi)          ! direction cosines
>will produce the random directions for you.  You might want to include
>this in your benchmark.  I think that this will beat your method with
>respect to speed.

>   Second, I do not completely understand your method. If I am reading
>your description correctly, you will *NOT* obtain a uniform
>distribution with your technique.  Have you histogrammed the direction
>cosines produced by your method? The distributions should be flat.
>Keep in mind that if you distribute points on the surface of a sphere
>uniformly, and then project that distribution onto a plane, you will
>not obtain a flat distrubution bounded by x**2+y**2 = 1.  Most of the
>sphere's projected area will be near x**2 + y**2 = 1.

>In short: Phi is flat from 0 to 2*pi
>          Cos(theta) is flat from -1 to 1

Well, I hadn't originally intended to go into this, but your post and
several e-mail messages I've recieved indicate that the method of
generating the uniform vectors needs to be explained a little better than
my admittedly sketchy description.  Looking back, I realized that I gave
the impression that I directly substituted the values of my two random
numbers as x and y, which of course does not give the propert distribution.

The method I used was suggested by Marsaglia (Ann. Math. Stat., 43:645-646,
1972).  I stumbled onto it entirely by accident, and it seems to be the
most elegant method out there.  Briefly, it is as follows:

1) Generate two random uniform variates u and v on (-1,1).  If S = u**2 +
v**2 > 1, then generate new values for u and v.

2) Calculate Q = sqrt(1-S), then
	x=2.*u*Q
	y=2.*v*Q
	z=1.-2.*S

The efficiency of step 1) goes as the area of a unit square divided by the
area of a unit circle, or 4/pi, so the average number of random numbers
generated per success is 8/pi, or 2.55.  Thus the method requires 2.55
random function calls plus a square root to produce a vector, as opposed to
your method, which requires 2 random function calls, one square root, and
two calls to trigonometric functions.

Marsaglia provides the proof for this method, which basically results from
the fact that the distribution of S is a uniform variate on (0,1) and
independent of (u,v).

This method is significantly faster than any method I've used requiring
trig functions.  Although I haven't tried the one you mention above, I'm
failry confident that it is faster than that one as well.

--
____________________________
	Jim Oliver   
	oliver@athena.mit.edu /	joliver@hstbme.mit.edu
	oliver%mitwccf.BITNET@MITVMA.MIT.EDU

tbb@sumac02.cray.com (Todd Blachowiak) (11/13/90)

In article <1990Nov8.215752.7075@athena.mit.edu>, oliver@athena.mit.edu (James D. Oliver III) writes:
|> I guess I need to make this clear: the random number genertor itself is
|> vectorizable, however, the values of interest to me are *functions* of
|> these randome variables.  To be more specific, I'm generating a series of
|> randomly oriented 3-D unit vectors.  The fastest way to do this is to
|> generate two random numbers for x and y and calculate z as long as x**2 +
|> y**2 <= 1.  This process, however, involves a conditional loop back to the
|> ranf() function and thus is not vectorizable.  However, the fact that this
|> method is not vectorizable is just of incidental interest, since it remains
|> the fastest way of doing it.

would something like this work for you.

     x(i) = ranf()
     y(i) = (1-x(i)*x(i))*ranf()
     z(i) = 1-x(i)*x(i)-y(i)*y(i)

I believe this would vectorize for you and it gets rid of your conditional.
Also, you do not have to make any wasted calls to the random number generator.

|> 
|> What I'd like to know is, *given* this way of generating my values, should
|> I expect it to be faster than reading the values from a file?  Like I said,
|> for this particular case I can run a benchmark, but I was wondering if
|> there was a general rule of thumb for the speed of I/O operations.
|> 
|> 
|> --
|> ____________________________
|> 	Jim Oliver   
|> 	oliver@athena.mit.edu /	joliver@hstbme.mit.edu
|> 	oliver%mitwccf.BITNET@MITVMA.MIT.EDU

-- 
Todd Blachowiak    tbb@cray.com

oliver@athena.mit.edu (James D. Oliver III) (11/13/90)

In article <108474@convex.convex.com>, dodson@convex.COM (Dave Dodson) writes:
>	real x(n),y(n),z(n),r(n)
>	integer indx(n)
>	do 10 i = 1, n
>	    x(i) = 2.0 * ranf() - 1.0
>	    y(i) = 2.0 * ranf() - 1.0
>	    z(i) = 2.0 * ranf() - 1.0
>	    r(i) = x(i)**2 + y(i)**2 + z(i)**2
>   10	continue
>	call whenflt (n,r,1,1.0,indx,nindx)
>cdir$	appropriate compiler directive to ignore dependencies in the loop.
>	do 20 i = 1, nindx
>	    j = indx(i)
>	    t = 1.0 / sqrt(r(j))
>	    x(i) = t * x(j)
>	    y(i) = t * y(j)
>	    z(i) = t * z(j)
>   20	continue

Your algorithm is the same as the original scheme proposed by von Neumann
for generating random vectors.  It's efficiency is given by the area of a
unit cube divided by a unit sphere, so 18/pi=5.73 random numbers plus a
square root must be generated for each vector, as opposed to the 2.55 plus
a square root for the Marsaglia method I posted in a recent follow-up

>You are using one of our competitors' machines, so I haven't tried this,
>but I believe it will all vectorize on a Cray, as it does on a Convex.  

Hey, offer me a price, my code knows no loyalty! :-)

>P.S.  I don't think you can calculate z from x and y and get a truly uniform
>distribution of unit vectors.  You have to generate x, y, and z independently
>and reject those outside the unit sphere to get uniformity.

As I said, Marsaglia devised a clever way around this; see a more recent
posting. 
--
____________________________
	Jim Oliver   
	oliver@athena.mit.edu /	joliver@hstbme.mit.edu
	oliver%mitwccf.BITNET@MITVMA.MIT.EDU

dodson@convex.COM (Dave Dodson) (11/14/90)

In article <1990Nov13.053812.19914@athena.mit.edu> oliver@athena.mit.edu (James D. Oliver III) writes:
>Your algorithm is the same as the original scheme proposed by von Neumann
>for generating random vectors.  It's efficiency is given by the area of a
>unit cube divided by a unit sphere, so 18/pi=5.73 random numbers plus a
>square root must be generated for each vector, as opposed to the 2.55 plus
>a square root for the Marsaglia method I posted in a recent follow-up

The intended contribution in my posting was how to vectorize the generation
of random unit vectors.  I suggest you combine Marsaglia's rejection algorithm
with my use of whenfle to produce a bunch of unit vectors in vector mode.
With the entire process vectorized, I would expect it to be much faster to
generate them than to input them.

----------------------------------------------------------------------

Dave Dodson		                             dodson@convex.COM
Convex Computer Corporation      Richardson, Texas      (214) 497-4234