njk@diku.dk (Niels J|rgen Kruse) (09/13/90)
I am crossposting this to comp.lang.c because interest in random number generators surface there regularly. ben@dit.lth.se (Ben Smeets) writes: >After doing some simulations I got suspicious about the UNIX random >number generator random() available on the SUN3-xx machines (ver 4.1). >I measured the distribution of the intervals in which the random numbers >lie above 0.045 times the total range of the generated numbers. The distribution >I obtained is not at all what one should expect if the random numbers are >really uniformly distributed in [0..2^31-1] and are independent. This sounds a lot like the Gap test to me. Is it? This is very interesting, your test apply to the most significant bits, which should be the best in an additive congruential generator like random(). Your test must be very sensitive or there must be a bug in it. Could you mail me the source for your test? I would like to reproduce your results. >So my questions are: > 1) How does random() really work?(I do not know it) The SunOS random is the same as the Bsd random, at least it produce identical streams and has the same bugs. What i say is based on the Bsd random. With the default statesize, the algorithm is s[i] = s[i-3] + s[i-31]. The number returned is s[i]>>1, so that 31 bits remain in the output. Initialization of s[1] .. s[31] is done with a LCG with power of 2 modulus (Ick!). > 2) Has anybody else noted some stange things with > this random number generator.? HahAAa! Have i noticed something? Nyah nyah nyah. OK here goes: 1) There is a severe shortage of seeds that produce streams which are reasonably independent of each other. If ``random()&01'' is used to produce a bitstream, as recommended in the manual for random(), there is effectively only 4 different seeds available, as all seeds that are equivalent modulo 4 result in identical streams. In general, seeds that agree on the N least significant bits result in streams that agree number for number on the N-1 least significant bits. The reason for this misery is the sloppy initialization. Random() is not the only PRNG with a deficiency like this, the Fortran implementation of the subtractive method recommended by Knuth in ACP vol 2 ed 2, p 171-172, has the same problem. IMHO, a good PRNG should a) allow seed vectors of arbitrary length, rather than just a single word of seed material. b) produce streams that can be multiplexed together to form a stream of no worse quality than the constituent streams, even when the seed vectors differ in but a single bit located anywhere. 2) The least significant bits of random() are detectably nonrandom, when more than about 5e7 numbers from the stream are examined. The test used is a 4 dimensional recurrence frequency test of the 2 least significant bits (%). This is due, not to the sloppy initialization, but to the poor choice 31 3 63 of trinomials for random(), x + x + 1 and x + x + 1 for the 2 largest state table sizes. These are among the worst possible choices, according to my tests. The morale is, that you can't just grab any trinomials and that period is a poor measure of quality. IMHO, a good PRNG should have all bits of equal, high quality. 3) Then there is the setstate bug. Setstate() is used to swap states to start generating from another stream. This involves encoding the positions of 2 pointers into the state table in a single word of the table and doing the reverse for the new state. I had to chuckle when i saw the comment preceding setstate in the source code. I quote: * Note that due to the order in which things are done, it is OK to call * setstate() with the same state as the current state. The code then proceeds to do things in the WRONG order. The effect of this is that if you preceed each call to random() with a call to setstate() and the state happens to be one and the same, then the 2 pointers into the state table are only advanced on every *second* number generated, rather than each time. Needless to say, this result in a stream of rather inferior quality. The funny thing is that the author had considered just this pattern of use, yet failed to see the bug. I guess it only goes to show yet again that You can eyeball a piece of code till you bleed, but there is no substitute for testing. Presence or absence of the setstate bug can be determined with this little program: -------CUT----------CUT---------CUT--------- #include <stdio.h> typedef union { long tab[32]; char c[sizeof( union { long tab[32]; } )]; } ran_s; extern long random(); extern char *initstate(), *setstate(); static ran_s Trs; #define ran_open(seed) ( \ (void) initstate ((unsigned)(seed),Trs.c,sizeof(Trs)), Trs \ ) #define get_r(r) ( (void) setstate ((r).c), random() ) main() { ran_s r = ran_open(0); char *result = "NOT"; unsigned long tab[100]; int i; for (i = 0; i < 100; i++) tab[i] = get_r(r); srandom(0); for (i = 0; i < 100; i++) if (tab[i] != random()) result = "IS"; printf ("Setstate bug %s present.\n",result); exit(0); } -------CUT----------CUT---------CUT--------- IMHO, this setstate nonsense stink. A good interface design for a PRNG should allow the desired stream to be specified every time a random number is requested at no additional cost. > 3) Are the people who can provide me with a good random > number generator? I have one that i think is good, but i am loath to part with it, as it is still under development and i haven't nearly tested it as much as i would like. However, if your test really flunk random() and not mine, i will mail you a copy, although i will have to tear my heart out. (%): proceed as follows. Take the 2 least significant bits of each number, 4 at a time to form 8 bit samples. Take 50 samples at a time to form tuples. Count the number of different samples in each tuple and subtract from 50. This is the number of recurrences in each tuple. Count the number of times there are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and >= 11 recurrences and compare with the numbers predicted for a truly random stream. The proper probabilities can be computed using the formula given in ACP vol 2 ed 2 under the Collision test. Perform Chi square analysis with 11 degrees of freedom. -- Niels J|rgen Kruse DIKU Graduate njk@diku.dk
quan@sol.surv.utas.oz (Stephen Quan) (09/17/90)
Don't know if anyone interested, but I use :
int random()
{
static int seed = 0;
seed = (seed * 13077 + 1729) % 32768;
return seed;
}
Which is short, simple, will give a cycle of 32768 however, but seemlingly
a fairly scattered region. I wrote it, so I like it!!! Only my 2c worth.
Stephen Quan,
University Of Tasmania,
School of Surveying,
Aussie.