[comp.unix.questions] random number generator random

ben@dit.lth.se (Ben Smeets) (09/05/90)

Hi:

After doing some simulations I got suspicious about the UNIX random
number generator random() available on the SUN3-xx machines (ver 4.1).
I measured the distribution of the intervals in which the random numbers
lie above 0.045 times the total range of the generated numbers. The distributionI obtained is not at all what one should expect if the random numbers are
really uniformly distributed in [0..2^31-1] and are independent.

So my questions are:
	1) How does random() really work?(I do not know it)

	2) Has anybody else noted some stange things with
	   this random number generator.?

	3) Are the people who can provide me with a good random
	   number generator?

Thanks for your cooperation.

Ben Smeets

sfy@dmp.csiro.au (Shane Youl) (09/06/90)

I came across the same problem some time ago, when I needed a good random
number generator for some Monte Carlo simulations. The following is the
response to my plea for assistance then. I subsequently received more
information from a guy at CERN and can dig this up for you if you need it.
This latter stuff is in Fortran.
-------

From:   IN%"IVANOVIC%VAXR@circus.llnl.gov"  "Vladimir Ivanovic 415.423.7786"  9-
MAR-1989 10:03
To:     info-vax@kl.sri.COM
Subj:   Random Number Generators: the minimal standard

    Below are the 4 implementations of the minimal standard of Park
    and Miller (see "Random Number Generators: Good Ones are Hard to
    Find", Communications of the ACM, October 1988 v31 #10.)  I quote
    from this highly recommended article:

        There is really no argument in support of ANY of the example
        generators cited in this section [The section's title: A Sampling
        of Inadequate Generators].  Most of them are naive implementations
        a deceptively simple idea.  Even the best of them, those which
        have a full period generator and are correctly implemented,
        are inferior to the minimal standard.
        ...
        If you have need for a random number generator, particularly
        one that will port to a variety of systems, and if you are not
        a specialist in random number generation and do not want to
        become one, use the minimal standard.  It should not be presumed
        that it is easier to write a better one.
        ...
        We also recommend the articles by L'Ecuyer and by Wichmann and
        Hill.  The COMBINED generators proposed in these articles represent
        a logical extension of the minimal standard.  These generators
        appear to yield better statistical properties in those (rare)
        situations when the minimal standard alone may be inadequate.

    So there you have it.  A previous posting giving 16-bit and 32-bit
    versions of the combined generators and here several portable versions
    of the minimal standard.

    Since the code is SO simple, there NO excuse for not using at least
    the minimal standard.  Happy hacking.

................................................................................

program MinimalStandard;

    var
        ISeed : integer;
        RSeed : real;

    function Random1 : real;
    { Integer Version 1 :                                                   }
    { NOT a correct version unless maxint is 2**46 - 1 or larger.           }
        const
            a = 16807;
            m = 2147483647;
    begin
        ISeed := (a * ISeed) mod m;
        Random1 := ISeed / m;
    end;

    function Random2 : real;
    { Integer Version 2:                                                    }
    { Correct on any system for which maxint is 2**31 - 1 or larger.        }
        const
            a = 16807;
            m = 2147483647;
            q = 127773;     { m div a }
            r = 2836;       { m mod a }
        var
            lo, hi, test : integer;
    begin
        hi := ISeed div q;
        lo := ISeed mod q;
        test := a * lo - r * hi;
        if test > 0
            then ISeed := test
            else ISeed := test + m;
        Random2 := ISeed / m;
    end;

    function Random3 : real;
    { Real Version 1:                                                       }
    { Correct if reals are represented with a 46-bit or larger mantissa     }
    { (excluding the sign bit).                                             }
        const
            a = 16807.0;
            m = 2147483647.0;
        var
            temp : real;
    begin
        temp := a * RSeed;
        RSeed := temp - m * Trunc(temp / m);
        Random3 := RSeed / m;
    end;

    function Random4 : real;
    { Real Version 2:                                                       }
    { Correct on any system for which reals are represented with a 32-bit   }
    { or larger mantissa (including the sign bit).                          }
        const
            a = 16807.0;
            m = 2147483647.0;
            q = 127773.0;   { m div a }
            r = 2836.0;     { m mod a }
        var
            lo, hi, test : real;
    begin
        hi := Trunc(RSeed / q);
        lo := RSeed - q * hi;
        test := a * lo - r * hi;
        if test > 0.0
            then RSeed := test
            else RSeed := test + m;
        Random4 := RSeed / m;
    end;
begin
    { Your code here... }
end.


-- 
                                         ____   _____     ____    ____
  Shane Youl                            /    \ /       / /    \  /    \   
  CSIRO Division of Mineral Products   /      /_____  / /_____/ /     /
  PO Box 124   Port Melbourne  3207   /            / / /   \   /     /
  AUSTRALIA                           \____/ _____/ / /     \  \____/
  Internet : sfy@dmp.CSIRO.AU     
  Phone    : +61-3-647-0211            SCIENCE  ADVANCING  AUSTRALIA

staff@cadlab.sublink.ORG (Alex Martelli) (09/06/90)

ben@dit.lth.se (Ben Smeets) writes:

>Hi:

>After doing some simulations I got suspicious about the UNIX random
>number generator random() available on the SUN3-xx machines (ver 4.1).
	...
>	2) Has anybody else noted some stange things with
>	   this random number generator.?

Don't know about this particular one, but random number generators are
notoriously easy-to-botch...

>	3) Are the people who can provide me with a good random
>	   number generator?

Knuth, "Art of Computer programming", volume 2, is a great reference.
Here's an implementation which is SLOW (boy, is it ever!) but gives
random uniform variates with GREAT spectral characteristics, and
(very important to ME) the SAME sequence of numbers on ANY platform 
whatsoever (should not be hard to recode in C):

       FUNCTION RAN1(IDUM)
*
* PORTABLE random-number generation routine, computationally costly
* but with nonpareil characteristics.  From "Numerical recipes", p.196;
* identifiers as in the original; added explicit declarations & SAVE.
*
* Call with IDUM < 0 to re-seed (not needed the first time), IDUM == 0
* for normal operation.
*
* Note the SAME sequence is generated on ANY machine (crucial for use
* of random numbers for any intra-machine test!).
*
       REAL*4 RAN1,R(97),RM1,RM2
       INTEGER*4 IDUM,M1,M2,M3,IA1,IA2,IA3,IC1,IC2,IC3,IFF,IX1,IX2,IX3,J
       PARAMETER (M1=259200,IA1=7141,IC1=54773,RM1=1.0/M1)
       PARAMETER (M2=134456,IA2=8121,IC2=28411,RM2=1.0/M2)
       PARAMETER (M3=243000,IA3=4561,IC3=51349)
       SAVE IFF, R, IX1, IX2, IX3
       DATA IFF /0/
*
* Initialize on first call, or whenever the parm is negative
*
      IF(IDUM.lt.0 .or. IFF.eq.0) THEN
          IFF=1
*                                   seed first routine
          IX1=MOD(IC1-IDUM,M1)
          IX1=MOD(IA1*IX1+IC1,M1)
*                                   seed second routine
          IX2=MOD(IX1,M2)
          IX1=MOD(IA1*IX1+IC1,M1)
*                                   seed third routine
          IX3=MOD(IX1,M3)
*                                   fill table w. 1st/2nd routines
          DO 11, J=1,97
                 IX1=MOD(IA1*IX1+IC1,M1)
                 IX2=MOD(IA2*IX2+IC2,M2)
*                                   combine hi with lo
                 R(J)=(FLOAT(IX1)+FLOAT(IX2)*RM2)*RM1
11        CONTINUE
          IDUM=1
      ENDIF
	  NCALLS = NCALLS + 1
*
* Starting point, save when (re)initializing: get next number on 
* each of the three sequences
*
       IX1=MOD(IA1*IX1+IC1,M1)
       IX2=MOD(IA2*IX2+IC2,M2)
       IX3=MOD(IA3*IX3+IC3,M3)
*                                   take index from 3rd sequence
       J=1+(97*IX3)/M3
*                                   sanity check, may omit in production
       IF(J.GT.97.OR.J.LT.1) THEN
		   WRITE(*,*) 'RAN1: J=',J,' IX3=',IX3
		   STOP
	   ENDIF
*                                   return this entry of table...
       RAN1=R(J)
*                                   ...and refill it
       R(J)=(FLOAT(IX1)+FLOAT(IX2)*RM2)*RM1
       RETURN
       END

-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 45, Bologna, Italia
Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only; any time of day or night).

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/07/90)

In article <1990Sep6.032718.16696@dmp.csiro.au> sfy@dmp.csiro.au (Shane Youl) writes:
>    (see "Random Number Generators: Good Ones are Hard to
>    Find", Communications of the ACM, October 1988 v31 #10.)

I found it hard to believe that CACM accepted that article, which was
full of unjustified assertions and did not at all advance the art of
designing pseudo-random sequence generators.

njk@diku.dk (Niels J|rgen Kruse) (09/13/90)

I am crossposting this to comp.lang.c because interest in
random number generators surface there regularly.

ben@dit.lth.se (Ben Smeets) writes:
>After doing some simulations I got suspicious about the UNIX random
>number generator random() available on the SUN3-xx machines (ver 4.1).
>I measured the distribution of the intervals in which the random numbers
>lie above 0.045 times the total range of the generated numbers. The distribution
>I obtained is not at all what one should expect if the random numbers are
>really uniformly distributed in [0..2^31-1] and are independent.

This sounds a lot like the Gap test to me.  Is it?  This is
very interesting, your test apply to the most significant bits,
which should be the best in an additive congruential generator
like random().  Your test must be very sensitive or there must
be a bug in it.  Could you mail me the source for your test?
I would like to reproduce your results.

>So my questions are:
>       1) How does random() really work?(I do not know it)

The SunOS random is the same as the Bsd random, at least it
produce identical streams and has the same bugs. What i say is
based on the Bsd random.  With the default statesize, the
algorithm is s[i] = s[i-3] + s[i-31].  The number returned is
s[i]>>1, so that 31 bits remain in the output.  Initialization
of s[1] .. s[31] is done with a LCG with power of 2 modulus
(Ick!).

>       2) Has anybody else noted some stange things with
>          this random number generator.?

HahAAa!  Have i noticed something?  Nyah nyah nyah.  OK here goes:

1)  There is a severe shortage of seeds that produce streams
which are reasonably independent of each other.  If ``random()&01''
is used to produce a bitstream, as recommended in the manual
for random(), there is effectively only 4 different seeds
available, as all seeds that are equivalent modulo 4 result in
identical streams.  In general, seeds that agree on the N least
significant bits result in streams that agree number for number
on the N-1 least significant bits.  The reason for this misery
is the sloppy initialization.  Random() is not the only PRNG with
a deficiency like this, the Fortran implementation of the
subtractive method recommended by Knuth in ACP vol 2 ed 2,
p 171-172, has the same problem.

IMHO, a good PRNG should
  a)  allow seed vectors of arbitrary length, rather than just
      a single word of seed material.
  b)  produce streams that can be multiplexed together to form
      a stream of no worse quality than the constituent
      streams, even when the seed vectors differ in but a
      single bit located anywhere.

2)  The least significant bits of random() are detectably
nonrandom, when more than about 5e7 numbers from the stream are
examined.  The test used is a 4 dimensional recurrence
frequency test of the 2 least significant bits (%).  This is
due, not to the sloppy initialization, but to the poor choice
                             31   3          63
of trinomials for random(), x  + x  + 1 and x   + x + 1 for the
2 largest state table sizes.  These are among the worst
possible choices, according to my tests.  The morale is, that
you can't just grab any trinomials and that period is a poor
measure of quality.

IMHO, a good PRNG should have all bits of equal, high quality.

3)  Then there is the setstate bug.  Setstate() is used to swap
states to start generating from another stream.  This involves
encoding the positions of 2 pointers into the state table in a
single word of the table and doing the reverse for the new
state.  I had to chuckle when i saw the comment preceding
setstate in the source code.  I quote:

 * Note that due to the order in which things are done, it is OK to call
 * setstate() with the same state as the current state.

The code then proceeds to do things in the WRONG order.
The effect of this is that if you preceed each call to random()
with a call to setstate() and the state happens to be one and
the same, then the 2 pointers into the state table are only
advanced on every *second* number generated, rather than each
time.  Needless to say, this result in a stream of rather
inferior quality.  The funny thing is that the author had
considered just this pattern of use, yet failed to see the bug.
I guess it only goes to show yet again that

        You can eyeball a piece of code till you bleed,
            but there is no substitute for testing.

Presence or absence of the setstate bug can be determined with
this little program:

-------CUT----------CUT---------CUT---------
#include <stdio.h>

typedef union {
          long tab[32];
          char c[sizeof( union { long tab[32]; } )];
        } ran_s;

extern long random();
extern char *initstate(), *setstate();

static ran_s Trs;

#define ran_open(seed) ( \
  (void) initstate ((unsigned)(seed),Trs.c,sizeof(Trs)), Trs \
)

#define get_r(r) ( (void) setstate ((r).c), random() )

main() {
  ran_s r = ran_open(0);
  char *result = "NOT";
  unsigned long tab[100];
  int i;

  for (i = 0; i < 100; i++)
    tab[i] = get_r(r);
  srandom(0);
  for (i = 0; i < 100; i++)
    if (tab[i] != random())
      result = "IS";
  printf ("Setstate bug %s present.\n",result);
  exit(0);
}
-------CUT----------CUT---------CUT---------

IMHO, this setstate nonsense stink.  A good interface design
for a PRNG should allow the desired stream to be specified
every time a random number is requested at no additional cost.

>       3) Are the people who can provide me with a good random
>          number generator?

I have one that i think is good, but i am loath to part with
it, as it is still under development and i haven't nearly tested
it as much as i would like.  However, if your test really flunk
random() and not mine, i will mail you a copy, although i will
have to tear my heart out.

(%): proceed as follows.  Take the 2 least significant bits of
each number, 4 at a time to form 8 bit samples.  Take 50
samples at a time to form tuples.  Count the number of
different samples in each tuple and subtract from 50.  This is
the number of recurrences in each tuple.  Count the number of
times there are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and >= 11
recurrences and compare with the numbers predicted for a truly
random stream.  The proper probabilities can be computed using
the formula given in ACP vol 2 ed 2 under the Collision test.
Perform Chi square analysis with 11 degrees of freedom.
-- 
Niels J|rgen Kruse 	DIKU Graduate 	njk@diku.dk