[sci.math] pseudo-random number generation

leo@athena.mit.edu (Oblivious Transfer) (04/27/88)

For those interested, the following is the introduction to the
Blum-Micali paper on pseudorandom number generation.  Although the paper
is rather difficult reading, the introduction is fairly easy to follow
and quite enjoyable.  I have included the numbered references, but not
mentioned what they are; also notation has been made TeX-like for
precision.  Hopefully there are not too many typos.

John Leo
leo@xx.lcs.mit.edu
leo@athena.mit.edu

******

Siam Journal of Computing, Volume 13, Number 4, November 1984
(pages 850-864)

How to Generate Cryptographically Strong Sequences of Pseudo-random Bits

Manuel Blum and Silvio Micali

ABSTRACT:  We give a set of conditions that allow one to generate 50-50
unpredictable bits.
  Based on those conditions, we present a general algorithmic scheme for
constructing polynomial-time deterministic algorithms that stretch a
short secret random input into a long sequence of unpredictable
pseudo-random bits.
  We give an implementation of our scheme and exhibit a pseudo-random
bit generator for which any efficient strategy for predicting the next
output bit with better thant 50-50 chance is easily transformable into
an "equally efficient" algorithm for solving the discrete logarithm
problem.  In particular, if the discrete logarithm problem cannot be
solved in probabilistic polynomial time, no probabilistic
polynomial-time algorithm can guess the next output bit better than by
flipping a coin: if "head" guess "0", if "tail" guess "1".

KEY WORDS. randomness, pseudo-random number generation,
unpredictability, random self-reducibility

1. INTRODUCTION
1.1 RANDOMNESS AND COMPLEXITY THEORY.  We introduce a new method of
generating sequences of pseudo-random bits.  Any such method implies,
directly or indirectly, a definition of randomness.

Much effort has been devoted in the second half of this century to make
precise the notion of randomness.  Let us informally recall Kolmogorov's
influential definition [18]:

   A sequence of bits A = a_1, a_2, ..., a_k is random if the length of
   the minimal program outputting A is at least k.

We remark that the length of a program, from a computational complexity
point of view, is a rather unnatural measure.  We want to investigate a
more operative defintion of randomness in the light of complexity
theory.

A mental experiment.  A and B want to play head and tail in four
different ways.  In all of them A "fairly" flips a "fair" coin.  In the
first way, A asks B to bet and then flips the coin.  In such a case we
expect B to win with a 50% frequency.  In the second way, A flips the
coin and, while it is still spinning in the air, she asks B to bet.  We
are still expecting B to win with a 50% frequency.  However, in the
second case the outcome of the toss is determined when B bets: in
principle, he could solve the equation of the motion and win!  The third
way is similar to the second one: B is allowed to bet when the coin is
spinning in the air, but he is also given a pocket calculator.  Nobody
will doubt that B is still going to win with 50% frequency: before he
can initialize any computation the coin will have come up head or tail.
The fourth way is similar to the third, except B is given a very
powerful computer, able to take pictures of the spinning coin, and
quickly computer its speed, momentum, etc.  We will not say that B will
always win, be we may suspect he may win 51% of the time!

The purpose of the above example is to suggest that
    The randomness of an event is relative to the specific model of
    computation with a specified amount of computing resources.

The links between randomness and the computation model were pointed out
by Michael Sipser in [31], where he defines randomness with respect to
finite state automata (see also [24]).  In his very nice paper [29],
Shamir considers also the factor of the computing resources, presents
significant progress in this direction and points out some open problems
as well.

In this paper we investigate the randomness of k-bit long sequences with
respect to the computation model of Boolean circuits with only Poly(k)
gates.

1.2 CSPRB GENERATORS.  We introduce the notion of a cryptographically
strong speudo-random bit generator (CSPRB generator) and show under
which conditions it can be constructed.  A CSPRB generator is a program
G that, upon receiving as input a random number i (hereafter referred to
as the seed), outputs a sequence of pseudo-random bits b_1, b_2, b_3,
... .  G possesses the following properties:
   1)  The bits b_k's are easy to generate.  Each b_k is output in time
polynomial in the length of the seed.
   2)  The bits b_k's are unpredictable.  Given the generator G and
b_1,...b_s, the first s output bits, but not the seed i, it is
computationally infeasible to predict the s+1st bit in the sequence with
better than 50-50 chance.  Here s is polynomial in the length of the
seed.

Our generators are an improvement of Shamir's pseudo-random number
generators.  In [29], Shamir presents programs that from a short secret
random seed, output a sequence of numbers x_i's such that the ability of
predicting the next output is equivalent to inverting the RSA function
[27].  The main difference between outs and Shamir's generator is:
   Shamir's generator outputs numbers and not bits.  Such numbers could
be unpredictable and yet of very special form.  In particular every bit
of (information about) the next number in the sequence could be heavily
biased or predictable with high probability.  As a consequence, if the
numbers so generated are 100 bits long, they might not be uniformly
spread in the interval [1,2^{100}].

1.3 PSEUDO-RANDOM SEQUENCES AND STATISTICAL TESTS.  Passing given
statistical tests is the key point for evaluating pseudo-random
sequences.  The classical sequence x_{i+1} = ax_i + b mod n, provides a
fast way of generating pseudo-random numbers.  Such a sequence is known
(for a clever choice of the parameters a, b, and n) to generate "well
mixed numbers" (see Knuth [17]).  However it is not cryptographically
strong.  Plumstead [25] shows that the sequence can be inferred even
when a, b and n are all unknown.

In constrast, our bit-sequences cannot be generated as fast, but cannot
be inferred either.  This is so because they have "embedded" some hard
problem.

An analysis of a particularly simple pseudo-random number generator
appears in Blum, Blum and Shub [9].  They point out that well mixed
sequences in which hard problems are embedded can nevertheless be poor
pseudo-random sequences.  Something more is needed to construct good
generators of pseudo-random sequences; what that is is pointed out in
section 2.

Unpredictability of the next output bit is the key test studied in this
paper.  In an earlier version of this paper [10], we presented a
deterministic polynomial-time algorithm that stretched a random k-bit
long seed into a polynomially (in k) long bit-sequence.  Any
probabilistic polynomial-time algorithm, correctly predicting the next
bit with probability greater than 1/2 + e [I'm using e for epsilon here]
in a so produced pseudo-random sequence, could be easily transformed
into a probabilistic algorithm, running in expected poly(k,e^{-1}) time,
for solving the discrete logarithm problem for a fraction e of the
primes of length k.

Though we were aware of the polynomial dependency on e^{-1} (in fact
[10] Lemma 2 explicitly states it), our main theorem summarized our
results stating that the next bit in our pseudo-random sequences could
not be predicted in polynomial (in k) time with probability greater than
1/2 + e, for 0 < e < 1.

Yao [33] was the first one to realize the importance of emphasizing the
polynomial functional dependency on e^{-1} and made excellent use of it
(see section 1.3.2).

Indeed, without any changes in our algorithm, e could be replaced with
the smallest value that will keep the running time polynomial.  Since in
our case the running time is polynomial in k and e^{-1}, we henceforth
use 1/poly(k) for e in this paper.

We now proceed to a formal treatment.

1.3.1 THE NEXT-BIT-TEST.  Let P be a polynomial and S = {S_k} be a
collection of multisets such that S_k contains P(k)-bit long sequences
(the same sequence s may belong more than once to S_k).  Let P_1 be a
polynomial.  A predicting collection C = {C^{i}_{k}} is a collection of
circuits such that each circuit C^i_k has less than P_1(k) gates, i
Boolean inputs, i < P(k), and one Boolean output.  On the input the
first i bits of a sequence s randomly selected in S_k, C^i_k will output
a bit b.  Let p^{C}_{k,i} denote the probability that b = the i+1st bit
of s.  We say that the collection S passes the next-bit-test if for all
predicting collections C, all polynomials Q, all sufficiently large k
and all i < P(K),
			p^{C}_{k,i} < 1/2 + 1/Q(k).

Ability to predict the next bit from the preceding ones is indeed a
statistical test for a bit-sequence.  In fact, if a bit-sequence were
generated by independent coin flips, no strategy would predict the next
bit with a success rate even slightly better than 50-50.  This
particular test is passed by the sequences producted by a CSPRB
generator.  Therefore CSPRB generators produce evenly distributed
numbers: just divide the output sequences into k-bit long segments.

Subsequently, Yao [33] showed the following very interesting result.

1.3.2. YAO'S STATISTICAL TEST.  The following definition is derived from
Yao [33].  As before, the collection S = {S_k} is such that the multiset
S_k contains P(k)-bit long sequences.

Let P_1 be a polynomial.  A polynomial-size statistical test is a
collection C = {C_k} of circuits.  Each circuit C_k has less than P_1(k)
gates, P(k) Boolean inputs and one Boolean output.  Let p^{C}_{k,S}
denote the probability that C_k outputs 1 on input a randomly selected
sequence in S_k, and p^{C}_{k,R} the probability that C_k outputs a 1 on
a randomly selected P(k)-bit long sequence.  The collection S passes all
polynomial-size statistical tests if for all polynomial-size statistical
test C, for any polynomial Q, for all sufficiently large k,
		|p^{C}_{k,S} - p^{C}_{k,R}| < 1/Q(k)

THEOREM (Yao).  A collection S = {S_k} passes the next-bit-test if and
only if it passes all polynomial-size statistical tests.

1.3.3. RELATED TESTS FOR STRINGS.  Earlier definitions and theorems
about tests for distinguishing strings belonging to two different sets
can be found in Goldwasser and Micali [13].  They presented a
probabilistic encryption scheme in which a single bit b is, wiht the
help of a coin, encoded by a k-bit long string B, called a probabilistic
encryption of b.  Here k is a security parameter.  Both 0 and 1 will
have many possible probabilistic encodings, but all of them are uniquely
decodable.  They defined a probabilistic encryption scheme to be
bit-secure if for all polynomials P and Q, for all sufficiently large k,
no circuit with less than P(k) gates can correctly guess whether B is
the encryption of 0 or 1 with probability greater than 1/2 + 1/Q(k).
Under an intractability assumption for the quadratic residuosity
problem, they present a probabilistic encryption scheme that is
bit-secure.

A probabilistic encryption of an n-bit long (n < P_1(k) for some
polynomial P_1) string b_1,...,b_n is a sequence B_1,...,B_n where each
B_i is a probabilistic encryption of b_i.  Let P be a polynomial.  A
separator is defined to be a collection of circuits C = {C_k}.  Each
circuit C_k has less than P(k) gates, kn Boolean inputs and one Boolean
output.  For a string s = s_1,...,s_n, let p^{C}_{s,k} denote the
probability that C_k outputs 1 on input a probabilistic encryption of s.
The encryptions of the string x = x_1,...,x_n are unseparable from the
encryptions of y_1,...,y_n if for all separators C and for all
polynomials Q, for sufficiently large k,
		|p^{C}_{x,k} - p^{C}_{y,k}| < 1/Q(k).

The computational difficulty of separating the encryptions of
polynomially long bit-sequences reduces to one of correctly guessing the
decoding of an encrypted single bit.

THEOREM (Goldwasser and Micali).  For any pair of n-bit long strings x
and y, the encryptions of x and y are unseparable if and only if the
encryption scheme is bit-secure.

1.4 INSTANCES OF THE CSPRB GENERATOR MODEL.  A general algorithmic
scheme for constructing CSPRB generators is presented in section 2.  The
first instance of this schem is based on the intractability assumption
for the discrete logarithm problem and is described in section 4.  Other
interesting instances of the general model have subsequently been found
based on the intractability assumption of various one-way function.  Yao
[33] and Blum, Blum and Sub [9] found instances based on the
intractability of deciding quadratic residuosity modulo composite
numbers whose factorization is unknown.  Yao [33] and Goldwasser, Micali
and Tong [14] implemented CSPRB generators based on the intractability
of factoring.  Yao [33] also proves that one can obtain instances of the
CSPRB generator scheme if one-way functions with a particular property
exist.

1.5 APPLICATIONS.  Recently, a large number of cryptographic protocols
for protecting private communication and business transactions have been
developed [7],[8],[13],[14],[15],[20].  The security of these new
protocols is based both on the security of some encryption scheme and
the ability of the participants to generate large random numbers unknown
to an adversary.  Security vanishes if an adversary, though not able to
break the encryption scheme, can successfully predict the output of the
psuedo-random number generator.  This is not an abstract worry as shown
by Plumstead.  The problem calls for the use of CSPRB generators.

In private key cryptography, one-time pads constitute the best type of
cryptosystem.  In practice, one-time pads are approximated by
pseudo-random number generators [4].  Shamir [29] points out that
"unpredictable" pseudo-random number generators may be a valid
substitute for one-time pads.  Therefore CSPRB generators are
particularly good substitutes for one-time pads: two partners who both
possess the same CSPRB generator and have secretly exchanged a random
seed, are actually sharing a long bit-sequence that can be successfully
used as a one-time pad.

The fact that the cryptographic strength of CSPRB generators depends
only on the secrecy of the seed and not the secrecy of the program,
makes them an available tool to mathematically nonsophisticated users.
In fact, it is unreasonable to assume that a business person should be
able to design a cryptographically strong generator.  However, anyone
can buy such a program from the public market and secretly select a
short random seed.