eric@snark.thyrsus.com (Eric S. Raymond) (05/13/91)
In one of my programs, I have a requirement to generate random shuffles of arbitrary size. I am using the obvious method, as follows: ----------------------------- CUT HERE ------------------------------------ #define TRANSPOSITIONS(n) (10 * n) void shuffle(size, shuffle) int size, *shuffle; { int i, j; for (i = 0; i < size; i++) shuffle[i] = i; for (i = 0; i < TRANSPOSITIONS(size); i++) { int holder, k; j = roll(size); k = roll(size); holder = shuffle[k]; shuffle[k] = shuffle[j]; shuffle[j] = holder; } } ----------------------------- CUT HERE ------------------------------------ Two questions: 1) Is there a known way of choosing a formula for TRANSPOSITIONS(n) to guarantee random mixing in some a priori sense? 2) This is, obviously, a naive method. Is there a `smart' algorithm that works better? Replies by email, please. -- Eric S. Raymond = eric@snark.thyrsus.com (mad mastermind of TMN-Netnews)
dd2i+@andrew.cmu.edu (Douglas Michael DeCarlo) (05/14/91)
In article <1bQCYz#6lHOhh7Jxk2b4WjLh65vrXpL=eric@snark.thyrsus.com> eric@snark.thyrsus.com (Eric S. Raymond) writes: > In one of my programs, I have a requirement to generate random shuffles > of arbitrary size. I am using the obvious method, as follows: > ..... > ..... > 1) Is there a known way of choosing a formula for TRANSPOSITIONS(n) to > guarantee random mixing in some a priori sense? > >2) This is, obviously, a naive method. Is there a `smart' algorithm that > works better? To swap N items in an array A[0..(n-1)], the following O(N) algorithm works: for (i=0; i < N; i++) { swap(i, i + random(N - i)); } Where random(r) is a function which returns a random integer in 0..(r-1) and swap(a, b) swaps the stuff in array A[a] with A[b]. - Doug
joe@proto.com (Joe Huffman) (05/16/91)
dd2i+@andrew.cmu.edu (Douglas Michael DeCarlo) writes: >In article <1bQCYz#6lHOhh7Jxk2b4WjLh65vrXpL=eric@snark.thyrsus.com> >eric@snark.thyrsus.com (Eric S. Raymond) writes: >> In one of my programs, I have a requirement to generate random shuffles >> of arbitrary size. I am using the obvious method, as follows: >To swap N items in an array A[0..(n-1)], the following O(N) algorithm works: >for (i=0; i < N; i++) { > swap(i, i + random(N - i)); >} I have a tournament scheduleing program that needs to do the 'draw' and place some of the entries ('seeds' go in specific locations) in random positions. I used a similar algorithm at first (swap(random(N),random(N)) and discovered that it wasn't random enough. I did the statisitical analysis but can't seem to find it right now. The problem with mine was that there was an abnormally high probability that the i'th element would remain in it's original position. I believe the above will suffer from the last few elements being unlikely to be moved to the beginning of the array. Perhaps in DeCarlo's algorithm one could use 'swap(i, random(N))' instead (I didn't think of that and haven't done the analysis). What I ended up doing was something like this (which may not be feasible in Raymond's case): void shuffle(int n, list new_list, list org_list) { while(n) { element temp = get_and_remove_element(org_list,random(n)); add_element(new_list,temp); n--; } } -- joe@proto.com
jgk@osc.COM (Joe Keane) (05/16/91)
In article <8c=sAju00Vp_Ihxl4r@andrew.cmu.edu> dd2i+@andrew.cmu.edu (Dee Dee Two Eye) writes: >To swap N items in an array A[0..(n-1)], the following O(N) algorithm works: > >for (i=0; i < N; i++) { > swap(i, i + random(N - i)); >} > >Where random(r) is a function which returns a random integer in 0..(r-1) >and swap(a, b) swaps the stuff in array A[a] with A[b]. This algorithm is much better than the naive one, which repeatedly swaps two randomly chosen elements. At the end of the loop, all permutations are equally probable, assuming the random function is uniform. The naive algorithm only approaches this asymptotically. Note that if you seed your random-number generator with a 32-bit integer, then there are only 2^32 possible permutations, no matter how big the array is. If you worry about such things, like i do, then you should make multiple passes over the array. Following i've included `shuffle.c', a program i wrote a while ago which uses the algorithm described above. I've found that it's a useful utility, so you may want to `clip and save' it. -- Joe Keane, professional C++ programmer jgk@osc.com (...!uunet!stratus!osc!jgk) -- #include <stdio.h> #include <sys/time.h> #define DEBUG 0 extern char* malloc(); extern char* realloc(); struct line { int length; char* ptr; }; int main (argc, argv) int argc; char* argv[]; { struct line* master_ptr; int master_size; #if DEBUG fputs("shuffle: reading lines...\n", stderr); #endif { int master_capacity; char *buffer_ptr; int buffer_capacity; master_capacity = 256; master_ptr = (struct line*)malloc(master_capacity * sizeof(struct line)); if (!master_ptr) goto out_of_memory; master_size = 0; buffer_capacity = 256; buffer_ptr = malloc(buffer_capacity); if (!buffer_ptr) goto out_of_memory; for (;;) { int c; int buffer_size; char* line_ptr; c = getchar(); if (c == EOF) goto eof; if (master_size >= master_capacity) { master_capacity *= 2; master_ptr = (struct line*)realloc(master_ptr, master_capacity * sizeof (struct line)); if (!master_ptr) goto out_of_memory; } buffer_size = 0; while (c != '\n') { if (buffer_size >= buffer_capacity) { buffer_capacity *= 2; buffer_ptr = realloc(buffer_ptr, buffer_capacity); } buffer_ptr[buffer_size] = c; buffer_size++; c = getchar(); if (c == EOF) { fputs("shuffle: adding newline at end of file\n", stderr); break; } } line_ptr = malloc(buffer_size); if (!line_ptr) goto out_of_memory; memcpy(line_ptr, buffer_ptr, buffer_size); master_ptr[master_size].length = buffer_size; master_ptr[master_size].ptr = line_ptr; master_size++; if (c == EOF) goto eof; } eof: free(buffer_ptr); } #if DEBUG fprintf(stderr, "shuffle: total of %d lines read\n", master_size); #endif { int pass; for (pass = 0; pass < 16; pass++) { struct timeval tv; int line_number; gettimeofday(&tv, 0); #if DEBUG fprintf(stderr, "shuffle: doing pass %d, time is %d seconds, %d microseconds...\n", pass, tv.tv_sec, tv.tv_usec); #endif srandom(tv.tv_sec ^ tv.tv_usec); for (line_number = 0; line_number < master_size; line_number++) { int other_line; struct line temp; other_line = random() % (master_size - line_number) + line_number; temp = master_ptr[line_number]; master_ptr[line_number] = master_ptr[other_line]; master_ptr[other_line] = temp; } } } #if DEBUG fputs("shuffle: writing lines...\n", stderr); #endif { int line_number; for (line_number = 0; line_number < master_size; line_number++) { char* ptr; char* end; ptr = master_ptr[line_number].ptr; end = ptr + master_ptr[line_number].length; while (ptr < end) { putchar(*ptr); ptr++; } putchar('\n'); } } #if DEBUG fputs("shuffle: all done\n", stderr); #endif return 0; out_of_memory: fputs("shuffle: out of memory\n", stderr); return 1; }
dik@cwi.nl (Dik T. Winter) (05/16/91)
In article <1991May15.183447.1437@proto.com> joe@proto.com (Joe Huffman) writes: > >To swap N items in an array A[0..(n-1)], the following O(N) algorithm works: > > >for (i=0; i < N; i++) { > > swap(i, i + random(N - i)); > >} > > I used a similar algorithm at first (swap(random(N),random(N)) > and discovered that it wasn't random enough. This is not at all similar. > > void shuffle(int n, list new_list, list org_list) > { > while(n) > { > element temp = get_and_remove_element(org_list,random(n)); > add_element(new_list,temp); > n--; > } > } And if you look long enough you will see that this is equivalent to the originally proposed algorithm. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
richard@aiai.ed.ac.uk (Richard Tobin) (05/16/91)
[What the #$%^* is it that puts "Followup-to: poster" on articles???] In article <1bQCYz#6lHOhh7Jxk2b4WjLh65vrXpL=eric@snark.thyrsus.com> eric@snark.thyrsus.com (Eric S. Raymond) writes: >In one of my programs, I have a requirement to generate random shuffles >of arbitrary size. Swap each element of the array, in order, with a random later-or-same element of the array, eg for(i=0; i<n; i++) { int j, t; j = i + random() % (n-i); t = a[i]; a[i] = a[j]; a[j] = t; } This essentially chooses a random element for the first position, then a random one of those remaining for the second position, and so on. Note that swapping each element with a random element chosen from the whole array (ie j = random() % n) doesn't give you a perfectly uniform distribution. > Replies by email, please. This is a common request, so I'm posting. -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
raman@cs.rochester.edu (Rajeev Raman) (05/16/91)
>In article <1bQCYz#6lHOhh7Jxk2b4WjLh65vrXpL=eric@snark.thyrsus.com> >eric@snark.thyrsus.com (Eric S. Raymond) writes: > In one of my programs, I have a requirement to generate random shuffles > of arbitrary size. I am using the obvious method, as follows: > [exchange a random pair of elements] Persi Diaconis and Mehrdad Shahshahani (Z. Wahrscheinlichkeitstheorie und verwandte Gebiete, 57:159-179, (1981)) show the following: Let T_k(P) be the probability that permutation P is generated after k such random transpositions. Let D_k = \sum_{all permutations P} | T_k(P) - 1/n! | Let k = cn + (n ln n)/2. Then D_k <= b e^{-2c}, for some constant b. A remark from their paper: "The problem studied here arose from two independent sources. The first source involved computer generation of a random permutation. [Knuth (1969) pp 124--126 shows that the method mentioned on the net involving n-1 transpositions produces exactly a uniform distribution.] One of us had a programmer who used the measure [T_k] to generate random permutations. It is not hard to see that for n > 2, [T_k] is never exactly uniform. A discussion arose about how large k should be to make [T_k] exactly uniform." Another remark (which they also make) to achieve a close to uniform distribution, the algorithm should do the following: pick i and j at random, and transpose regardless of whether i=j or not. (Ie, count as a transposition even the identity transposition.) Otherwise, for even k you get only even permutations, &c. This also can be used to prove that after k transpositions (even k) the probability that you get an even permutation is 1/2 (+/-) some finite quantity, for n > 2. Thus the distribution is never exactly uniform. Rajeev. -- Rajeev Raman ARPA: raman@cs.rochester.edu UUCP: ...!rutgers!rochester!raman
jroth@allvax.enet.dec.com (Jim Roth) (05/17/91)
In article <4787@osc.COM>, jgk@osc.COM (Joe Keane) writes... > ... stuff deleted ... > >Note that if you seed your random-number generator with a 32-bit integer, then >there are only 2^32 possible permutations, no matter how big the array is. If >you worry about such things, like i do, then you should make multiple passes >over the array. This won't result in any more possible permutations, only a different set of them. - Jim