[comp.sources.misc] v02i096: xrand, xcrypt -- a random number generator & application

agn@UNH.CS.CMU.EDU (Andreas Nowatzyk) (04/14/88)

comp.sources.misc: Volume 2, Issue 96
Submitted-By: "Andreas Nowatzyk" <agn@UNH.CS.CMU.EDU>
Archive-Name: xrand

The attached random-number generator was writen because of a need for
a good source of random numbers for a research project that involved
systems without such facility (for example a research prototype with
only a C-compiler and minimal libraries).

Since the results are critical to my work, a serious attempt was made
to write a 'really good' RNG: that is several CPU weeks on assorted
machines were spend to try out (and discard) various approaches.

The generator below is based on 2 well known methods: Linear Congruential
(LCGs) and Additive Congruential generators (ACGs). 

The LCG produces the longest possible sequence of 32 bit 'random' numbers,
each being unique in that sequence (it has only 32 bits of state).
It suffers from 2 problems:
a) Independence isn't great, that is the (n+1)th number is somewhat
   related to the preceding one, unlike flipping a coin where knowing the
   past outcomes don't help to predict the next result.
b) Taking parts of a LCG generated number can be quite non-random: for
   example, looking at only the least significant byte gives a permuted
   8-bit counter (that has a  period length of only 256).
The advantage of an LCA is that it is perfectly uniform when run for the
entire period length (and very uniform for smaller sequences too, if
the parameters are chosen carefully).

ACG's have extremly long period lengths and provide good independence.
Unfortunately, uniformity isn't not too great. Furthermore, I didn't
find any theoretically analysis of ACG's that addresses uniformity.

The RNG given below will return numbers generated by an LCA that are
permuted under control of a ACG. 2 permutations take place: the
4 bytes of one LCG generated number are subjected to one of 16 permutations
selected by 4 bits of the ACG. The permutation a such that byte of the
result may come from each byte of the LCG number. This effectively destroys
the structure within a word. Finally, the sequence of such numbers is
permuted within a range of 256 numbers. This greatly improves independence.

xcrypt is a recreational application (as I have little need for secrecy) 
of this random number generator to en/decrypt files. However, I do believe
that this byproduct is quite strong and a challenge ($50)
to this effect has been posted to sci.crypt. Breaking xcrypt would probably
reveal a flaw in xrand and I'm highly interested in any such problem.

  Cheers  --  Andreas

-------------------- Cut here --------------------
#
# type    sh /usru0/agn/x.shar   to unpack this archive.
#
echo extracting xrand.1...
cat >xrand.1 <<'!E!O!F!'
.TH XRAND 3 4/13/88
.CM 4
.SH NAME
xrand \- pseudo random number generators
.SH SYNOPSIS
void rnd_init(seed)
unsigned seed;

long rnd_i()

unsigned long rnd_u()

long rnd_ri(range)
long range;

double rnd_01d()

double rnd_ned(lamda)
double lamda;
.SH DESCRIPTION
xrand is a collection of pseudo-random number generators that were
carefully tested for uniform distribution and independence.
These tests included testing parts of the generated numbers
(say a byte or a bit) for the same properties.

.B rnd_init
must be called before any random numbers are drawn.
Different seeds will
yield different, but deterministic sequences of random numbers with
like statistical properties.
.B rnd_i
returns positive integers,
.B rnd_u
returns unsigned integers (all 32 bit are 
.I random
), 
.B rnd_ri
returns integers in the range [0..range-1],
.B rnd_01d
returns doubles in the range of [0.0..1.0),
.B rnd_ned
returns negative-exponential distributed doubles in
the range [0.0..+infinity).
.SH ENVIRONMENT
.B rnd_ned
requires
.B log()
from the math library.
.SH "SEE ALSO"
.I rand(3C),
which produces pseudo-random numbers which are not very random and
.I
random(3),
which is about 45% faster than xrand but has inferior uniformity.
.SH BUGS
.B xrand
assumes that long words have 32 bits and that 2's complement
arithmetic is used.
.SH HISTORY
.TP
13-Apr-88  Andreas Nowatzyk (agn) at Carnegie-Mellon University
Created.
!E!O!F!
#
# type    sh /usru0/agn/x.shar   to unpack this archive.
#
echo extracting xrand.c...
cat >xrand.c <<'!E!O!F!'
#include <math.h>

/* Random number generators:
 *
 *  rnd_init (unsigned seed) 
 *			: initializes the generator
 *
 *  rnd_i ()		: returns positive integers [0,0x7fffffff]
 *  rnd_u ()		: returns unsigned's        [0,0xffffffff]
 *  rnd_ri (long n)	: returns positive integers [0,n-1]
 *  rnd_01d ()		: returns doubles	    [0.0,1.0)
 *			  Note: ")" is no typo - rnd_01d will not return a 1.0,
 *                              but can return the next smaller FP number.
 *  rnd_ned (double lam): returns neg. exponential distributed doubles [0.0,+inf)
 *
 *  Algorithm M as describes in Knuth's "Art of Computer Programming", Vol 2. 1969
 *  is used with a linear congruential generator (to get a good uniform
 *  distribution) that is permuted with a Fibonacci additive congruential
 *  generator to get good independence.
 *
 *  Bit, byte, and word distributions were extensively tested and pass
 *  Chi-squared test near perfect scores (>7E8 numbers tested, Uniformity
 *  assumption holds with probability > 0.999)
 *
 *  Run-up tests for on 7E8 numbers confirm independence with
 *  probability > 0.97.
 *
 *  Plotting random points in 2d reveals no apparent structure.
 *
 *  Autocorrelation on sequences of 5E5 numbers (A(i) = SUM X(n)*X(n-i), i=1..512)
 *  results in no obvious structure (A(i) ~ const).
 *
 *  On a SUN 3/60, rnd_u() takes about 19.4 usec per call, which is about 44%
 *  slower than Berkeley's random() (13.5 usec/call).
 *
 *  Except for speed and memory requirements, this generator outperforms
 *  random() for all tests. (random() scored rather low on uniformity tests,
 *  while independence test differences were less dramatic).
 *
 *  Thanks to M.Mauldin, H.Walker, J.Saxe and M.Molloy for inspiration & help.
 *
 *  (c) Copyright 1988 by A. Nowatzyk
 *
 */

/* LC-parameter selection follows recommendations in 
 * "Handbook of Mathematical Functions" by Abramowitz & Stegun 10th, edi.
 */
#define LC_A 66049		    /* = 251^2, ~= sqrt(2^32)			*/
#define LC_C 3907864577		    /* result of a long trial & error series    */

#define Xrnd(x) (x * LC_A + LC_C)   /* the LC polynomial			*/
			
static unsigned long Fib[55];	    /* will use X(n) = X(n-55) - X(n-24)	*/
static int Fib_ind;		    /* current index in circular buffer		*/
static unsigned long Xrnd_var;	    /* LCA - recurrence variable		*/
static unsigned long auxtab[256];   /* temporal permutation table		*/
static unsigned long prmtab[64] = { /* spatial permutation table		*/
    0xffffffff, 0x00000000,  0x00000000,  0x00000000,  /* 3210 */
    0x0000ffff, 0x00ff0000,  0x00000000,  0xff000000,  /* 2310 */
    0xff0000ff, 0x0000ff00,  0x00000000,  0x00ff0000,  /* 3120 */
    0x00ff00ff, 0x00000000,  0xff00ff00,  0x00000000,  /* 1230 */

    0xffff0000, 0x000000ff,  0x00000000,  0x0000ff00,  /* 3201 */
    0x00000000, 0x00ff00ff,  0x00000000,  0xff00ff00,  /* 2301 */
    0xff000000, 0x00000000,  0x000000ff,  0x00ffff00,  /* 3102 */
    0x00000000, 0x00000000,  0x00000000,  0xffffffff,  /* 2103 */

    0xff00ff00, 0x00000000,  0x00ff00ff,  0x00000000,  /* 3012 */
    0x0000ff00, 0x00000000,  0x00ff0000,  0xff0000ff,  /* 2013 */
    0x00000000, 0x00000000,  0xffffffff,  0x00000000,  /* 1032 */
    0x00000000, 0x0000ff00,  0xffff0000,  0x000000ff,  /* 1023 */

    0x00000000, 0xffffffff,  0x00000000,  0x00000000,  /* 0321 */
    0x00ffff00, 0xff000000,  0x00000000,  0x000000ff,  /* 0213 */
    0x00000000, 0xff000000,  0x0000ffff,  0x00ff0000,  /* 0132 */
    0x00000000, 0xff00ff00,  0x00000000,  0x00ff00ff   /* 0123 */
};

union hack {			    /* used to access doubles as unsigneds	*/
    double d;
    unsigned long u[2];
};

static union hack man;		    /* mantissa bit vector			*/

rnd_init (seed)			    /* modified: seed 0-31 use precomputed stuff */
    unsigned seed;
{
    register unsigned long u;
    register int i;
    double x, y;
    union hack t;

    static unsigned seed_tab[32] = {
		0xbdcc47e5, 0x54aea45d, 0xec0df859, 0xda84637b,
		0xc8c6cb4f, 0x35574b01, 0x28260b7d, 0x0d07fdbf,
		0x9faaeeb0, 0x613dd169, 0x5ce2d818, 0x85b9e706,
		0xab2469db, 0xda02b0dc, 0x45c60d6e, 0xffe49d10,
		0x7224fea3, 0xf9684fc9, 0xfc7ee074, 0x326ce92a,
		0x366d13b5, 0x17aaa731, 0xeb83a675, 0x7781cb32,
		0x4ec7c92d, 0x7f187521, 0x2cf346b4, 0xad13310f,
		0xb89cff2b, 0x12164de1, 0xa865168d, 0x32b56cdf  };

    if (seed < 32)
	u = seed_tab[seed];
    else
	u = seed ^ seed_tab[seed & 31];

    for (i = 55; i--;)		    /* set up Fibonacci additive congruential	*/
	Fib[i] = u = Xrnd(u);

    for (i = 256; i--;)
	auxtab[i] = u = Xrnd(u);

    Fib_ind = u % 55;		    /* select a starting point			*/

    Xrnd_var = u;

    if (sizeof(x) != 2 * sizeof(unsigned long)) {
	x = 0.0;
	y = 1.0;
	y /= x;			    /*** intentional divide by 0: rnd_01d will
					 not work because a double doesn't fit
					 in 2 unsigned longs on your machine! ***/
    };

    x = 1.0;
    y = 0.5;
    do {			    /* find largest fp-number < 2.0		*/
	t.d = x;
	x += y;
	y *= 0.5;
    } while (x != t.d && x < 2.0);

    man.d = 1.0;
    man.u[0] ^= t.u[0];
    man.u[1] ^= t.u[1];		    /* man is now 1 for each mantissa bit	*/
}

long rnd_i ()
/*
 * returns a positive, uniformly distributed random number in [0,0x7fffffff]
 */
{ 
    register unsigned long i, j, *t = Fib;

    i = Fib_ind;
    j = t[i];				    /* = X(n-55) */
    j -= (i >= 24) ? t[i - 24] : t[i + 21]; /* = X(n-24) */
    t[i] = j;
    if (++i >= 55) i = 0;
    Fib_ind = i;

    t = &auxtab[(j >> 24) & 0xff];
    i = *t;
    Xrnd_var = *t = Xrnd(Xrnd_var);
    t = &prmtab[j & 0x3c];

    j =  *t++ & i;
    j |= *t++ & ((i << 24) | ((i >>  8) & 0x00ffffff));
    j |= *t++ & ((i << 16) | ((i >> 16) & 0x0000ffff));
    j |= *t   & ((i <<  8) | ((i >> 24) & 0x000000ff));
    
    return j & 0x7fffff;
}

unsigned long rnd_u ()
/*
 * same as rnd_i, but gives full 32 bit range
 */
{ 
    register unsigned long i, j, *t = Fib;

    i = Fib_ind;
    j = t[i];				    /* = X(n-55) */
    j -= (i >= 24) ? t[i - 24] : t[i + 21]; /* = X(n-24) */
    t[i] = j;
    if (++i >= 55) i = 0;
    Fib_ind = i;

    t = &auxtab[(j >> 24) & 0xff];
    i = *t;
    Xrnd_var = *t = Xrnd(Xrnd_var);
    t = &prmtab[j & 0x3c];

    j =  *t++ & i;
    j |= *t++ & ((i << 24) | ((i >>  8) & 0x00ffffff));
    j |= *t++ & ((i << 16) | ((i >> 16) & 0x0000ffff));
    j |= *t   & ((i <<  8) | ((i >> 24) & 0x000000ff));
    
    return j;
}

long rnd_ri (rng)
    long rng;
/*
 * randint: Return a random integer in a given Range [0..rng-1]
 *          Note:  0 < rng
 */
{
    register unsigned long  r, a;

    do {
	r = rnd_i();
	a = (r / rng) + 1;
	a *= rng;
    } while (a >= 0x7ffffff);
    
    a--;
    return a - r;
}

double rnd_01d ()
/*
 * returns a uniformly distributed double in the range of [0..1)
 *         or  0.0 <= rnd_01d() < 1.0 to be precise
 *
 * Note: this code assumes that 2 'unsigned long's can hold a 'double'
 *       (works on SUN-3's, SUN-4's, MIPS, VAXen, IBM RT's)
 */
{
    union hack t;

    t.d = 1.0;

    t.u[0] |= rnd_u() & man.u[0];	      /* munch in 1st part   */
    t.u[1] |= rnd_u() & man.u[1];	      /* munch in 2nd part   */

    return t.d - 1.0;
}

double rnd_ned (lam)
    double lam;
/*
 * returns a neg. exponential distributed double in the range of [0..+infinity)
 *         or  0.0 <= rnd_neg() < +infinity to be precise
 *
 * Note: this code assumes that 2 'unsigned long's can hold a 'double'
 *       it also assumes that 'log()' behaves as advertised.
 *
 */
{
    union hack t;

    t.d = 1.0;

    t.u[0] |= rnd_u() & man.u[0];	      /* munch in 1st part   */
    t.u[1] |= rnd_u() & man.u[1];	      /* munch in 2nd part   */

    return -log(2.0 - t.d) / lam;
}
!E!O!F!
#
# type    sh /usru0/agn/x.shar   to unpack this archive.
#
echo extracting xcrypt.1...
cat >xcrypt.1 <<'!E!O!F!'
.TH XCRYPT 1 4/13/88
.CM 4
.SH NAME
xcrypt \- data encryption/decryption program
.SH SYNOPSIS
encrypt <plain-text file> <cipher-text file>

decrypt <cipher-text file> <plain-text file>
.SH DESCRIPTION
.B xcrypt
uses the observation that a good pseudo-random number
generator can be used to generate a strong encryption.
.B xcrypt
is based on
.B xrand(3)
that was extensively tested for randomness (some CPU-week were devoted
to this effort).

.B xcrypt
invoked with a name starting with either 'e' or 'E' will write an
encrypted version of the plain-text file to the cipher text file.
Any other invocation will cause an decryption.
Both files are treated as binary files. ASCII-text files will become
binary files upon encryption.
The key string will be prompted from <stdin>.
.B xcrypt
will compute for a few seconds before starting the translation, which
is essentially I/O limited.
This initialization phase is indispensable
and was designed to thwart brute force key guessing attempts.
Key lengths of 10-1000 characters are meaningful.
.SH ENVIRONMENT
.B xcrypt
requires
.B xrand(3)
.SH HISTORY
.TP
13-Apr-88  Andreas Nowatzyk (agn) at Carnegie-Mellon University
Created.
!E!O!F!
#
# type    sh /usru0/agn/x.shar   to unpack this archive.
#
echo extracting xcrypt.c...
cat >xcrypt.c <<'!E!O!F!'
/*
 * xcrypt: a Encryption / Decryption program.
 *
 * xcrypt is loosely based on an idea that was proposed by M.Mauldin @ CMU
 * (and potentially many other): use a pseudo-random number generator (RNG)
 * to produce one-time pads.
 *
 * So, the basic approach is simply to initialize the RNG with the
 * encryption key and then decide what part of the RNG output is
 * used to encrypt the message. In xcrypt, a high quality RNG is
 * employed that was extensively tested for uniform distribution
 * and independence. This is expected to yield a rather strong encryption.
 *
 * Selecting the starting point of the random number sequence is based on a
 * seed that is derived from the unix time()-call and the process ID.
 * This seed (4 bytes) is prefixed to the ciphertext (similar to the typical
 * unix password encoding scheme). You may consider transmitting the seed
 * by an other channel.
 *
 * Actually, both the seed and the key are used to determine the initial
 * state of the RNG, which is less computationally expensive than letting
 * the RNG run for value-of(seed) iterations.
 *
 * This particular RNG has 9990bits of state (yes, more than 1Kbyte!), so
 * so keys of 1000 char are still meaningful (albeit cumbersome).
 *
 * In order to thwart a key-guessing, brute force attack, the RNG is
 * run for 60000 to 120000 iterations before the output is used for
 * encryption. The RNG output of this initialization phase is used to
 * construct 2 sets of 256 permutations of [0,1,...,255], that will
 * be used later. Thus, any shortcut of these iterations (the actual
 * number depends on the key and is not known to the attacker) would
 * also have to produce all the intermediate results. Given the low
 * complexity of an iteration, it's non-linearity and the intermediate
 * result requirement, it appears to me that it is provable that no
 * dramatic shortcut exists.
 *
 * The RNG delivers 32bit random numbers. All 32 bits are
 * used to encrypt an 8bit character:
 *    1. use 8bit to XOR the plaintext character
 *    2. use 8bit to select a random permutation (from set 1, see above)
 *    3. use 8bit to XOR the result of step 2
 *    4. use 8bit to select a random permutation (set 2)
 * Every bit of the RNG output will change the plain-to-cipher text
 * character mapping, but unlike the simple XOR case, it is not possible
 * to reconstruct the RNG output from a plain/cipher text pair (you
 * would have to guess about 24bits!).
 *
 * The strength of xcrypt is based on:
 *   1. a high quality RNG with plenty of state information.
 *   2. a computation intensive initialization phase
 *   3. 'information loss' in the encryption phase that
 *      prevents the reconstruction of the 'PAD' even in the
 *      known plain text case.
 *
 * Note: 'xcrypt' is a small just-for-fun program, but the underlying
 *       RNG was written and extensively tested as part of a serious
 *       research project.
 *
 * April, 1988  A. Nowatzyk
 *
 */

#include <stdio.h>
#include <sys/file.h>

char perm_tab1[0x10000];	/* permutation tables		*/
char perm_tab2[0x10000];

#include "xrand.c"

main(argc, argv)
int	argc;
char	*argv[];
{
    unsigned	    seed;
    char            buf[4];
    int             ifd, ofd;

    if (argc != 3 || 0 > (ifd = open(argv[1], O_RDONLY)) || 
        0 > (ofd = open(argv[2], O_WRONLY | O_CREAT, 0600))) {
	fprintf(stderr, "en/decrypt <input-file> <output-file>\n");
	exit(1);
    }

    if (*argv[0] == 'e' || *argv[0] == 'E') {	/** encryption **/

	seed = time(0l);	/* get an arbitrary seed	*/
	seed ^= getpid();

	get_key(seed);		/* get the key			*/

	buf[0] = seed;		/* write out the seed		*/
	buf[1] = seed >> 8;
	buf[2] = seed >> 16;
	buf[3] = seed >> 24;
	write(ofd, buf, 4);

	encrypt(ifd, ofd);
	
    } else {			/** decryption ******************/

	if (4 != read(ifd, buf, 4)) {	/* retrieve seed	*/
	    fprintf(stderr, "Can't read '%s'\n", argv[1]);
	    exit(1);
	}
	seed  =  buf[0]        & 0x000000ff;
	seed |= (buf[1] << 8)  & 0x0000ff00;
	seed |= (buf[2] << 16) & 0x00ff0000;
	seed |= (buf[3] << 24) & 0xff000000;

	get_key(seed);		/* get the key			*/

	inv_tab(perm_tab1);	/* invert permutation tables	*/
	inv_tab(perm_tab2);
	
	decrypt(ifd, ofd);
    }

    close(ifd);
    close(ofd);
    exit(0);
}

get_key(seed)
    unsigned seed;
{
    char            buf[1024];
    register int    i, j, k;
    static char    *key_chars = 
	"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
    register char   t;

    rnd_init(seed);

    printf("Key:");		/* I know, there are better ways */
    gets(buf);

    for (i = 0, j = 0; buf[i] && i < 1024; i++)
	for (k = 0; key_chars[k]; k++)
	    if (buf[i] == key_chars[k]) {
		j++;
		Fib[(7 * k) % 55] ^= 625111 * k + rnd_u();
		buf[i] = '?';	/* munch key over		*/
		break;
	    }

    if (j < 10)
	fprintf(stderr, "Warning: your key is rather short\n");

    i = rnd_u();		/* initial permutation table 1	*/
    for (j = 0; j < 256; j++) {
	for (k = 0; k < 256; k++)
	    perm_tab1[j * 256 + k] = i + k;
	i++;
    }
    for (j = 30000 + rnd_ri(30000l); j--;) {	/* shuffle pt 1 */
	i = rnd_u();
	k = (i & 0xff00) | ((i >> 24) & 0xff);
	i &= 0xffff;
				/* the random number is broken into
				 * 3 parts, n1, n2, n3 of 1 byte each.
				 * the n1-th and n2-th elements of
                                 * the n3-th permutation are transposed.
                                 */
	t = perm_tab1[k];
	perm_tab1[k] = perm_tab1[i];
	perm_tab1[i] = t;
    }

    i = rnd_u();		/* initial permutation table 2	*/
    for (j = 0; j < 256; j++) {
	for (k = 0; k < 256; k++)
	    perm_tab2[j * 256 + k] = i + k;
	i++;
    }
    for (j = 30000 + rnd_ri(30000l); j--;) {	/* shuffle pt 2 */
	i = rnd_u();
	k = (i & 0xff00) | ((i >> 24) & 0xff);
	i &= 0xffff;
	t = perm_tab2[k];
	perm_tab2[k] = perm_tab2[i];
	perm_tab2[i] = t;
    }
}

inv_tab(pt)			/* compute inverse permutations */
    char *pt;
{
    register int i, j;
    char t[256];

    for (i = 0; i < 256; i++) {
	for (j = 0; j < 256; j++)
	    t[pt[i * 256 + j] & 0xff] = j;
	for (j = 0; j < 256; j++)
	    pt[i * 256 + j] = t[j];
    }
}

encrypt (ifd, ofd)		/* encrypt stuff		*/
    int ifd, ofd;
{
    register int i, j, k;
    register unsigned long u;
    register char *p;
    char buf[1024];

    while (0 < (i = read(ifd, buf, 1024))) {
	for (p = buf, j = i; j--; p++) {

	    u = rnd_u();		/* draw a random number */

	    k = *p & 0xff;		/* get char to encrypt	*/

	    k ^= u & 0xffff;		/* step 1		*/
	    k = perm_tab1[k] & 0xff;

	    k ^= (u >> 16) & 0xffff;	/* step 2		*/
	    *p = perm_tab2[k];
	}
	write(ofd, buf, i);
    }
}

decrypt (ifd, ofd)		/* decrypt stuff		*/
    int ifd, ofd;
{
    register int i, j, k;
    register unsigned long u;
    register char *p;
    char buf[1024];

    while (0 < (i = read(ifd, buf, 1024))) {
	for (p = buf, j = i; j--; p++) {

	    u = rnd_u();		/* draw a random number */

	    k = *p & 0xff;		/* get char to decrypt	*/

	    k |= (u >> 16) & 0xff00;	/* undo step 2		*/
	    k = (perm_tab2[k] ^ (u >> 16)) & 0xff;

	    k |= u & 0xff00;		/* undo step 1		*/
	    *p = perm_tab1[k] ^ u;
	}
	write(ofd, buf, i);
    }
}
!E!O!F!
-- 
   --  Andreas Nowatzyk  (DC5ZV)

   Carnegie-Mellon University	     Arpa-net:   agn@unh.cs.cmu.edu
   Computer Science Department         Usenet:   ...!seismo!unh.cs.cmu.edu!agn