[sci.math] Shuffle generation

eric@snark.thyrsus.com (Eric S. Raymond) (05/13/91)

In one of my programs, I have a requirement to generate random shuffles
of arbitrary size.  I am using the obvious method, as follows:

----------------------------- CUT HERE ------------------------------------
#define TRANSPOSITIONS(n)	(10 * n)

void shuffle(size, shuffle)
int size, *shuffle;
{
    int	i, j;

    for (i = 0; i < size; i++)
	shuffle[i] = i;
    for (i = 0; i < TRANSPOSITIONS(size); i++)
    {
	int	holder, k;

	j = roll(size);
	k = roll(size);

	holder = shuffle[k];
	shuffle[k] = shuffle[j];
	shuffle[j] = holder;
    }
}
----------------------------- CUT HERE ------------------------------------

Two questions:

1) Is there a known way of choosing a formula for TRANSPOSITIONS(n) to
   guarantee random mixing in some a priori sense?

2) This is, obviously, a naive method.  Is there a `smart' algorithm that
   works better?

Replies by email, please.
-- 
      Eric S. Raymond = eric@snark.thyrsus.com  (mad mastermind of TMN-Netnews)

dd2i+@andrew.cmu.edu (Douglas Michael DeCarlo) (05/14/91)

In article <1bQCYz#6lHOhh7Jxk2b4WjLh65vrXpL=eric@snark.thyrsus.com>
eric@snark.thyrsus.com (Eric S. Raymond) writes:
> In one of my programs, I have a requirement to generate random shuffles
> of arbitrary size.  I am using the obvious method, as follows:
> .....
> .....
> 1) Is there a known way of choosing a formula for TRANSPOSITIONS(n) to
>   guarantee random mixing in some a priori sense?
>
>2) This is, obviously, a naive method.  Is there a `smart' algorithm that
>   works better?

To swap N items in an array A[0..(n-1)], the following O(N) algorithm works:

for (i=0; i < N; i++) {
     swap(i, i + random(N - i));
}

Where random(r) is a function which returns a random integer in 0..(r-1)
and swap(a, b) swaps the stuff in array A[a] with A[b].

- Doug

joe@proto.com (Joe Huffman) (05/16/91)

dd2i+@andrew.cmu.edu (Douglas Michael DeCarlo) writes:

>In article <1bQCYz#6lHOhh7Jxk2b4WjLh65vrXpL=eric@snark.thyrsus.com>
>eric@snark.thyrsus.com (Eric S. Raymond) writes:
>> In one of my programs, I have a requirement to generate random shuffles
>> of arbitrary size.  I am using the obvious method, as follows:

>To swap N items in an array A[0..(n-1)], the following O(N) algorithm works:

>for (i=0; i < N; i++) {
>     swap(i, i + random(N - i));
>}

I have a tournament scheduleing program that needs to do the 'draw' and
place some of the entries ('seeds' go in specific locations) in random
positions.  I used a similar algorithm at first (swap(random(N),random(N))
and discovered that it wasn't random enough.  I did the statisitical 
analysis but can't seem to find it right now.  The problem with mine was 
that there was an abnormally high probability that the i'th element would 
remain in it's original position.  I believe the above will suffer from the 
last few elements being unlikely to be moved to the beginning of the array.

Perhaps in DeCarlo's algorithm one could use 'swap(i, random(N))' instead
(I didn't think of that and haven't done the analysis).

What I ended up doing was something like this (which may not be feasible
in Raymond's case):

void shuffle(int n, list new_list, list org_list)
{
	while(n)
	{
		element temp = get_and_remove_element(org_list,random(n));
		add_element(new_list,temp);
		n--;
	}
}
-- 
joe@proto.com

jgk@osc.COM (Joe Keane) (05/16/91)

In article <8c=sAju00Vp_Ihxl4r@andrew.cmu.edu> dd2i+@andrew.cmu.edu (Dee Dee
Two Eye) writes:
>To swap N items in an array A[0..(n-1)], the following O(N) algorithm works:
>
>for (i=0; i < N; i++) {
>     swap(i, i + random(N - i));
>}
>
>Where random(r) is a function which returns a random integer in 0..(r-1)
>and swap(a, b) swaps the stuff in array A[a] with A[b].

This algorithm is much better than the naive one, which repeatedly swaps two
randomly chosen elements.  At the end of the loop, all permutations are
equally probable, assuming the random function is uniform.  The naive
algorithm only approaches this asymptotically.

Note that if you seed your random-number generator with a 32-bit integer, then
there are only 2^32 possible permutations, no matter how big the array is.  If
you worry about such things, like i do, then you should make multiple passes
over the array.

Following i've included `shuffle.c', a program i wrote a while ago which uses
the algorithm described above.  I've found that it's a useful utility, so you
may want to `clip and save' it.
--
Joe Keane, professional C++ programmer
jgk@osc.com (...!uunet!stratus!osc!jgk)
--
#include <stdio.h>
#include <sys/time.h>

#define DEBUG 0

extern char* malloc();
extern char* realloc();

struct line
{
  int length;
  char* ptr;
};


int main (argc, argv)
  int argc;
  char* argv[];
{
  struct line* master_ptr;
  int master_size;

#if DEBUG
  fputs("shuffle: reading lines...\n", stderr);
#endif
  {
    int master_capacity;
    char *buffer_ptr;
    int buffer_capacity;

    master_capacity = 256;
    master_ptr = (struct line*)malloc(master_capacity * sizeof(struct line));
    if (!master_ptr)
      goto out_of_memory;
    master_size = 0;
    buffer_capacity = 256;
    buffer_ptr = malloc(buffer_capacity);
    if (!buffer_ptr)
      goto out_of_memory;
    for (;;)
    {
      int c;
      int buffer_size;
      char* line_ptr;

      c = getchar();
      if (c == EOF)
	goto eof;
      if (master_size >= master_capacity)
      {
	master_capacity *= 2;
	master_ptr = (struct line*)realloc(master_ptr, master_capacity * sizeof (struct line));
	if (!master_ptr)
	  goto out_of_memory;
      }
      buffer_size = 0;

      while (c != '\n')
      {
	if (buffer_size >= buffer_capacity)
	{
	  buffer_capacity *= 2;
	  buffer_ptr = realloc(buffer_ptr, buffer_capacity);
	}
	buffer_ptr[buffer_size] = c;
	buffer_size++;
	c = getchar();
	if (c == EOF)
	{
	  fputs("shuffle: adding newline at end of file\n", stderr);
	  break;
	}
      }

      line_ptr = malloc(buffer_size);
      if (!line_ptr)
	goto out_of_memory;
      memcpy(line_ptr, buffer_ptr, buffer_size);
      master_ptr[master_size].length = buffer_size;
      master_ptr[master_size].ptr = line_ptr;
      master_size++;
      if (c == EOF)
	goto eof;
    }

  eof:
    free(buffer_ptr);
  }
#if DEBUG
  fprintf(stderr, "shuffle: total of %d lines read\n", master_size);
#endif

  {
    int pass;

    for (pass = 0; pass < 16; pass++)
    {
      struct timeval tv;
      int line_number;

      gettimeofday(&tv, 0);
#if DEBUG
      fprintf(stderr, "shuffle: doing pass %d, time is %d seconds, %d microseconds...\n", pass, tv.tv_sec, tv.tv_usec);
#endif
      srandom(tv.tv_sec ^ tv.tv_usec);
      for (line_number = 0; line_number < master_size; line_number++)
      {
	int other_line;
	struct line temp;

	other_line = random() % (master_size - line_number) + line_number;
	temp = master_ptr[line_number];
	master_ptr[line_number] = master_ptr[other_line];
	master_ptr[other_line] = temp;
      }
    }
  }

#if DEBUG
  fputs("shuffle: writing lines...\n", stderr);
#endif
  {
    int line_number;

    for (line_number = 0; line_number < master_size; line_number++)
    {
      char* ptr;
      char* end;

      ptr = master_ptr[line_number].ptr;
      end = ptr + master_ptr[line_number].length;
      while (ptr < end)
      {
	putchar(*ptr);
	ptr++;
      }
      putchar('\n');
    }
  }

#if DEBUG
  fputs("shuffle: all done\n", stderr);
#endif
  return 0;

 out_of_memory:
  fputs("shuffle: out of memory\n", stderr);
  return 1;
}

dik@cwi.nl (Dik T. Winter) (05/16/91)

In article <1991May15.183447.1437@proto.com> joe@proto.com (Joe Huffman) writes:
 > >To swap N items in an array A[0..(n-1)], the following O(N) algorithm works:
 > 
 > >for (i=0; i < N; i++) {
 > >     swap(i, i + random(N - i));
 > >}
 > 
 >             I used a similar algorithm at first (swap(random(N),random(N))
 > and discovered that it wasn't random enough.
This is not at all similar.
 > 
 > void shuffle(int n, list new_list, list org_list)
 > {
 > 	while(n)
 > 	{
 > 		element temp = get_and_remove_element(org_list,random(n));
 > 		add_element(new_list,temp);
 > 		n--;
 > 	}
 > }
And if you look long enough you will see that this is equivalent to the
originally proposed algorithm.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

richard@aiai.ed.ac.uk (Richard Tobin) (05/16/91)

[What the #$%^* is it that puts "Followup-to: poster" on articles???]

In article <1bQCYz#6lHOhh7Jxk2b4WjLh65vrXpL=eric@snark.thyrsus.com> eric@snark.thyrsus.com (Eric S. Raymond) writes:
>In one of my programs, I have a requirement to generate random shuffles
>of arbitrary size.

Swap each element of the array, in order, with a random later-or-same
element of the array, eg

  for(i=0; i<n; i++)
  {
    int j, t;

    j = i + random() % (n-i);

    t = a[i]; a[i] = a[j]; a[j] = t;
  }

This essentially chooses a random element for the first position, then
a random one of those remaining for the second position, and so on.

Note that swapping each element with a random element chosen from the
whole array (ie j = random() % n) doesn't give you a perfectly uniform
distribution.

> Replies by email, please.

This is a common request, so I'm posting.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

raman@cs.rochester.edu (Rajeev Raman) (05/16/91)

>In article <1bQCYz#6lHOhh7Jxk2b4WjLh65vrXpL=eric@snark.thyrsus.com>
>eric@snark.thyrsus.com (Eric S. Raymond) writes:
> In one of my programs, I have a requirement to generate random shuffles
> of arbitrary size.  I am using the obvious method, as follows:
>  [exchange a random pair of elements]

Persi Diaconis and Mehrdad Shahshahani (Z. Wahrscheinlichkeitstheorie
und verwandte Gebiete, 57:159-179, (1981)) show the following:

Let T_k(P) be the probability that permutation P is generated
		after k such random transpositions. 
Let D_k = \sum_{all permutations P} | T_k(P) - 1/n! |

Let k = cn + (n ln n)/2. Then

D_k <= b e^{-2c}, for some constant b.

A remark from their paper:

"The problem studied here arose from two independent sources. The first
source involved computer generation of a random permutation. [Knuth (1969)
pp 124--126 shows that the method mentioned on the net involving n-1 
transpositions produces exactly a uniform distribution.] One of us had
a programmer who used the measure [T_k] to generate random permutations.
It is not hard to see that for n > 2, [T_k] is never exactly uniform.
A discussion arose about how large k should be to make [T_k]
exactly uniform."

Another remark (which they also make) to achieve a close to uniform
distribution, the algorithm should do the following: pick i and j
at random, and transpose regardless of whether i=j or not. (Ie, count
as a transposition even the identity transposition.) Otherwise, for
even k you get only even permutations, &c. This also can be used
to prove that after k transpositions (even k) the probability that
you get an even permutation is 1/2 (+/-) some finite quantity, for
n > 2. Thus the distribution is never exactly uniform.

Rajeev.
-- 
Rajeev Raman 	
ARPA: raman@cs.rochester.edu  
UUCP: ...!rutgers!rochester!raman

jroth@allvax.enet.dec.com (Jim Roth) (05/17/91)

In article <4787@osc.COM>, jgk@osc.COM (Joe Keane) writes...
> ... stuff deleted ...
> 
>Note that if you seed your random-number generator with a 32-bit integer, then
>there are only 2^32 possible permutations, no matter how big the array is.  If
>you worry about such things, like i do, then you should make multiple passes
>over the array.

This won't result in any more possible permutations, only a different
set of them.

- Jim