[comp.lang.c] random number generator random

njk@diku.dk (Niels J|rgen Kruse) (09/13/90)

I am crossposting this to comp.lang.c because interest in
random number generators surface there regularly.

ben@dit.lth.se (Ben Smeets) writes:
>After doing some simulations I got suspicious about the UNIX random
>number generator random() available on the SUN3-xx machines (ver 4.1).
>I measured the distribution of the intervals in which the random numbers
>lie above 0.045 times the total range of the generated numbers. The distribution
>I obtained is not at all what one should expect if the random numbers are
>really uniformly distributed in [0..2^31-1] and are independent.

This sounds a lot like the Gap test to me.  Is it?  This is
very interesting, your test apply to the most significant bits,
which should be the best in an additive congruential generator
like random().  Your test must be very sensitive or there must
be a bug in it.  Could you mail me the source for your test?
I would like to reproduce your results.

>So my questions are:
>       1) How does random() really work?(I do not know it)

The SunOS random is the same as the Bsd random, at least it
produce identical streams and has the same bugs. What i say is
based on the Bsd random.  With the default statesize, the
algorithm is s[i] = s[i-3] + s[i-31].  The number returned is
s[i]>>1, so that 31 bits remain in the output.  Initialization
of s[1] .. s[31] is done with a LCG with power of 2 modulus
(Ick!).

>       2) Has anybody else noted some stange things with
>          this random number generator.?

HahAAa!  Have i noticed something?  Nyah nyah nyah.  OK here goes:

1)  There is a severe shortage of seeds that produce streams
which are reasonably independent of each other.  If ``random()&01''
is used to produce a bitstream, as recommended in the manual
for random(), there is effectively only 4 different seeds
available, as all seeds that are equivalent modulo 4 result in
identical streams.  In general, seeds that agree on the N least
significant bits result in streams that agree number for number
on the N-1 least significant bits.  The reason for this misery
is the sloppy initialization.  Random() is not the only PRNG with
a deficiency like this, the Fortran implementation of the
subtractive method recommended by Knuth in ACP vol 2 ed 2,
p 171-172, has the same problem.

IMHO, a good PRNG should
  a)  allow seed vectors of arbitrary length, rather than just
      a single word of seed material.
  b)  produce streams that can be multiplexed together to form
      a stream of no worse quality than the constituent
      streams, even when the seed vectors differ in but a
      single bit located anywhere.

2)  The least significant bits of random() are detectably
nonrandom, when more than about 5e7 numbers from the stream are
examined.  The test used is a 4 dimensional recurrence
frequency test of the 2 least significant bits (%).  This is
due, not to the sloppy initialization, but to the poor choice
                             31   3          63
of trinomials for random(), x  + x  + 1 and x   + x + 1 for the
2 largest state table sizes.  These are among the worst
possible choices, according to my tests.  The morale is, that
you can't just grab any trinomials and that period is a poor
measure of quality.

IMHO, a good PRNG should have all bits of equal, high quality.

3)  Then there is the setstate bug.  Setstate() is used to swap
states to start generating from another stream.  This involves
encoding the positions of 2 pointers into the state table in a
single word of the table and doing the reverse for the new
state.  I had to chuckle when i saw the comment preceding
setstate in the source code.  I quote:

 * Note that due to the order in which things are done, it is OK to call
 * setstate() with the same state as the current state.

The code then proceeds to do things in the WRONG order.
The effect of this is that if you preceed each call to random()
with a call to setstate() and the state happens to be one and
the same, then the 2 pointers into the state table are only
advanced on every *second* number generated, rather than each
time.  Needless to say, this result in a stream of rather
inferior quality.  The funny thing is that the author had
considered just this pattern of use, yet failed to see the bug.
I guess it only goes to show yet again that

        You can eyeball a piece of code till you bleed,
            but there is no substitute for testing.

Presence or absence of the setstate bug can be determined with
this little program:

-------CUT----------CUT---------CUT---------
#include <stdio.h>

typedef union {
          long tab[32];
          char c[sizeof( union { long tab[32]; } )];
        } ran_s;

extern long random();
extern char *initstate(), *setstate();

static ran_s Trs;

#define ran_open(seed) ( \
  (void) initstate ((unsigned)(seed),Trs.c,sizeof(Trs)), Trs \
)

#define get_r(r) ( (void) setstate ((r).c), random() )

main() {
  ran_s r = ran_open(0);
  char *result = "NOT";
  unsigned long tab[100];
  int i;

  for (i = 0; i < 100; i++)
    tab[i] = get_r(r);
  srandom(0);
  for (i = 0; i < 100; i++)
    if (tab[i] != random())
      result = "IS";
  printf ("Setstate bug %s present.\n",result);
  exit(0);
}
-------CUT----------CUT---------CUT---------

IMHO, this setstate nonsense stink.  A good interface design
for a PRNG should allow the desired stream to be specified
every time a random number is requested at no additional cost.

>       3) Are the people who can provide me with a good random
>          number generator?

I have one that i think is good, but i am loath to part with
it, as it is still under development and i haven't nearly tested
it as much as i would like.  However, if your test really flunk
random() and not mine, i will mail you a copy, although i will
have to tear my heart out.

(%): proceed as follows.  Take the 2 least significant bits of
each number, 4 at a time to form 8 bit samples.  Take 50
samples at a time to form tuples.  Count the number of
different samples in each tuple and subtract from 50.  This is
the number of recurrences in each tuple.  Count the number of
times there are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and >= 11
recurrences and compare with the numbers predicted for a truly
random stream.  The proper probabilities can be computed using
the formula given in ACP vol 2 ed 2 under the Collision test.
Perform Chi square analysis with 11 degrees of freedom.
-- 
Niels J|rgen Kruse 	DIKU Graduate 	njk@diku.dk

quan@sol.surv.utas.oz (Stephen Quan) (09/17/90)

Don't know if anyone interested, but I use :

int random()
{
  static int seed = 0;
  seed = (seed * 13077 + 1729) % 32768;
  return seed;
}

Which is short, simple, will give a cycle of 32768 however, but seemlingly
a fairly scattered region.  I wrote it, so I like it!!!  Only my 2c worth.

Stephen Quan,
University Of Tasmania,
School of Surveying,
Aussie.