leo@athena.mit.edu (Oblivious Transfer) (04/27/88)
For those interested, the following is the introduction to the Blum-Micali paper on pseudorandom number generation. Although the paper is rather difficult reading, the introduction is fairly easy to follow and quite enjoyable. I have included the numbered references, but not mentioned what they are; also notation has been made TeX-like for precision. Hopefully there are not too many typos. John Leo leo@xx.lcs.mit.edu leo@athena.mit.edu ****** Siam Journal of Computing, Volume 13, Number 4, November 1984 (pages 850-864) How to Generate Cryptographically Strong Sequences of Pseudo-random Bits Manuel Blum and Silvio Micali ABSTRACT: We give a set of conditions that allow one to generate 50-50 unpredictable bits. Based on those conditions, we present a general algorithmic scheme for constructing polynomial-time deterministic algorithms that stretch a short secret random input into a long sequence of unpredictable pseudo-random bits. We give an implementation of our scheme and exhibit a pseudo-random bit generator for which any efficient strategy for predicting the next output bit with better thant 50-50 chance is easily transformable into an "equally efficient" algorithm for solving the discrete logarithm problem. In particular, if the discrete logarithm problem cannot be solved in probabilistic polynomial time, no probabilistic polynomial-time algorithm can guess the next output bit better than by flipping a coin: if "head" guess "0", if "tail" guess "1". KEY WORDS. randomness, pseudo-random number generation, unpredictability, random self-reducibility 1. INTRODUCTION 1.1 RANDOMNESS AND COMPLEXITY THEORY. We introduce a new method of generating sequences of pseudo-random bits. Any such method implies, directly or indirectly, a definition of randomness. Much effort has been devoted in the second half of this century to make precise the notion of randomness. Let us informally recall Kolmogorov's influential definition [18]: A sequence of bits A = a_1, a_2, ..., a_k is random if the length of the minimal program outputting A is at least k. We remark that the length of a program, from a computational complexity point of view, is a rather unnatural measure. We want to investigate a more operative defintion of randomness in the light of complexity theory. A mental experiment. A and B want to play head and tail in four different ways. In all of them A "fairly" flips a "fair" coin. In the first way, A asks B to bet and then flips the coin. In such a case we expect B to win with a 50% frequency. In the second way, A flips the coin and, while it is still spinning in the air, she asks B to bet. We are still expecting B to win with a 50% frequency. However, in the second case the outcome of the toss is determined when B bets: in principle, he could solve the equation of the motion and win! The third way is similar to the second one: B is allowed to bet when the coin is spinning in the air, but he is also given a pocket calculator. Nobody will doubt that B is still going to win with 50% frequency: before he can initialize any computation the coin will have come up head or tail. The fourth way is similar to the third, except B is given a very powerful computer, able to take pictures of the spinning coin, and quickly computer its speed, momentum, etc. We will not say that B will always win, be we may suspect he may win 51% of the time! The purpose of the above example is to suggest that The randomness of an event is relative to the specific model of computation with a specified amount of computing resources. The links between randomness and the computation model were pointed out by Michael Sipser in [31], where he defines randomness with respect to finite state automata (see also [24]). In his very nice paper [29], Shamir considers also the factor of the computing resources, presents significant progress in this direction and points out some open problems as well. In this paper we investigate the randomness of k-bit long sequences with respect to the computation model of Boolean circuits with only Poly(k) gates. 1.2 CSPRB GENERATORS. We introduce the notion of a cryptographically strong speudo-random bit generator (CSPRB generator) and show under which conditions it can be constructed. A CSPRB generator is a program G that, upon receiving as input a random number i (hereafter referred to as the seed), outputs a sequence of pseudo-random bits b_1, b_2, b_3, ... . G possesses the following properties: 1) The bits b_k's are easy to generate. Each b_k is output in time polynomial in the length of the seed. 2) The bits b_k's are unpredictable. Given the generator G and b_1,...b_s, the first s output bits, but not the seed i, it is computationally infeasible to predict the s+1st bit in the sequence with better than 50-50 chance. Here s is polynomial in the length of the seed. Our generators are an improvement of Shamir's pseudo-random number generators. In [29], Shamir presents programs that from a short secret random seed, output a sequence of numbers x_i's such that the ability of predicting the next output is equivalent to inverting the RSA function [27]. The main difference between outs and Shamir's generator is: Shamir's generator outputs numbers and not bits. Such numbers could be unpredictable and yet of very special form. In particular every bit of (information about) the next number in the sequence could be heavily biased or predictable with high probability. As a consequence, if the numbers so generated are 100 bits long, they might not be uniformly spread in the interval [1,2^{100}]. 1.3 PSEUDO-RANDOM SEQUENCES AND STATISTICAL TESTS. Passing given statistical tests is the key point for evaluating pseudo-random sequences. The classical sequence x_{i+1} = ax_i + b mod n, provides a fast way of generating pseudo-random numbers. Such a sequence is known (for a clever choice of the parameters a, b, and n) to generate "well mixed numbers" (see Knuth [17]). However it is not cryptographically strong. Plumstead [25] shows that the sequence can be inferred even when a, b and n are all unknown. In constrast, our bit-sequences cannot be generated as fast, but cannot be inferred either. This is so because they have "embedded" some hard problem. An analysis of a particularly simple pseudo-random number generator appears in Blum, Blum and Shub [9]. They point out that well mixed sequences in which hard problems are embedded can nevertheless be poor pseudo-random sequences. Something more is needed to construct good generators of pseudo-random sequences; what that is is pointed out in section 2. Unpredictability of the next output bit is the key test studied in this paper. In an earlier version of this paper [10], we presented a deterministic polynomial-time algorithm that stretched a random k-bit long seed into a polynomially (in k) long bit-sequence. Any probabilistic polynomial-time algorithm, correctly predicting the next bit with probability greater than 1/2 + e [I'm using e for epsilon here] in a so produced pseudo-random sequence, could be easily transformed into a probabilistic algorithm, running in expected poly(k,e^{-1}) time, for solving the discrete logarithm problem for a fraction e of the primes of length k. Though we were aware of the polynomial dependency on e^{-1} (in fact [10] Lemma 2 explicitly states it), our main theorem summarized our results stating that the next bit in our pseudo-random sequences could not be predicted in polynomial (in k) time with probability greater than 1/2 + e, for 0 < e < 1. Yao [33] was the first one to realize the importance of emphasizing the polynomial functional dependency on e^{-1} and made excellent use of it (see section 1.3.2). Indeed, without any changes in our algorithm, e could be replaced with the smallest value that will keep the running time polynomial. Since in our case the running time is polynomial in k and e^{-1}, we henceforth use 1/poly(k) for e in this paper. We now proceed to a formal treatment. 1.3.1 THE NEXT-BIT-TEST. Let P be a polynomial and S = {S_k} be a collection of multisets such that S_k contains P(k)-bit long sequences (the same sequence s may belong more than once to S_k). Let P_1 be a polynomial. A predicting collection C = {C^{i}_{k}} is a collection of circuits such that each circuit C^i_k has less than P_1(k) gates, i Boolean inputs, i < P(k), and one Boolean output. On the input the first i bits of a sequence s randomly selected in S_k, C^i_k will output a bit b. Let p^{C}_{k,i} denote the probability that b = the i+1st bit of s. We say that the collection S passes the next-bit-test if for all predicting collections C, all polynomials Q, all sufficiently large k and all i < P(K), p^{C}_{k,i} < 1/2 + 1/Q(k). Ability to predict the next bit from the preceding ones is indeed a statistical test for a bit-sequence. In fact, if a bit-sequence were generated by independent coin flips, no strategy would predict the next bit with a success rate even slightly better than 50-50. This particular test is passed by the sequences producted by a CSPRB generator. Therefore CSPRB generators produce evenly distributed numbers: just divide the output sequences into k-bit long segments. Subsequently, Yao [33] showed the following very interesting result. 1.3.2. YAO'S STATISTICAL TEST. The following definition is derived from Yao [33]. As before, the collection S = {S_k} is such that the multiset S_k contains P(k)-bit long sequences. Let P_1 be a polynomial. A polynomial-size statistical test is a collection C = {C_k} of circuits. Each circuit C_k has less than P_1(k) gates, P(k) Boolean inputs and one Boolean output. Let p^{C}_{k,S} denote the probability that C_k outputs 1 on input a randomly selected sequence in S_k, and p^{C}_{k,R} the probability that C_k outputs a 1 on a randomly selected P(k)-bit long sequence. The collection S passes all polynomial-size statistical tests if for all polynomial-size statistical test C, for any polynomial Q, for all sufficiently large k, |p^{C}_{k,S} - p^{C}_{k,R}| < 1/Q(k) THEOREM (Yao). A collection S = {S_k} passes the next-bit-test if and only if it passes all polynomial-size statistical tests. 1.3.3. RELATED TESTS FOR STRINGS. Earlier definitions and theorems about tests for distinguishing strings belonging to two different sets can be found in Goldwasser and Micali [13]. They presented a probabilistic encryption scheme in which a single bit b is, wiht the help of a coin, encoded by a k-bit long string B, called a probabilistic encryption of b. Here k is a security parameter. Both 0 and 1 will have many possible probabilistic encodings, but all of them are uniquely decodable. They defined a probabilistic encryption scheme to be bit-secure if for all polynomials P and Q, for all sufficiently large k, no circuit with less than P(k) gates can correctly guess whether B is the encryption of 0 or 1 with probability greater than 1/2 + 1/Q(k). Under an intractability assumption for the quadratic residuosity problem, they present a probabilistic encryption scheme that is bit-secure. A probabilistic encryption of an n-bit long (n < P_1(k) for some polynomial P_1) string b_1,...,b_n is a sequence B_1,...,B_n where each B_i is a probabilistic encryption of b_i. Let P be a polynomial. A separator is defined to be a collection of circuits C = {C_k}. Each circuit C_k has less than P(k) gates, kn Boolean inputs and one Boolean output. For a string s = s_1,...,s_n, let p^{C}_{s,k} denote the probability that C_k outputs 1 on input a probabilistic encryption of s. The encryptions of the string x = x_1,...,x_n are unseparable from the encryptions of y_1,...,y_n if for all separators C and for all polynomials Q, for sufficiently large k, |p^{C}_{x,k} - p^{C}_{y,k}| < 1/Q(k). The computational difficulty of separating the encryptions of polynomially long bit-sequences reduces to one of correctly guessing the decoding of an encrypted single bit. THEOREM (Goldwasser and Micali). For any pair of n-bit long strings x and y, the encryptions of x and y are unseparable if and only if the encryption scheme is bit-secure. 1.4 INSTANCES OF THE CSPRB GENERATOR MODEL. A general algorithmic scheme for constructing CSPRB generators is presented in section 2. The first instance of this schem is based on the intractability assumption for the discrete logarithm problem and is described in section 4. Other interesting instances of the general model have subsequently been found based on the intractability assumption of various one-way function. Yao [33] and Blum, Blum and Sub [9] found instances based on the intractability of deciding quadratic residuosity modulo composite numbers whose factorization is unknown. Yao [33] and Goldwasser, Micali and Tong [14] implemented CSPRB generators based on the intractability of factoring. Yao [33] also proves that one can obtain instances of the CSPRB generator scheme if one-way functions with a particular property exist. 1.5 APPLICATIONS. Recently, a large number of cryptographic protocols for protecting private communication and business transactions have been developed [7],[8],[13],[14],[15],[20]. The security of these new protocols is based both on the security of some encryption scheme and the ability of the participants to generate large random numbers unknown to an adversary. Security vanishes if an adversary, though not able to break the encryption scheme, can successfully predict the output of the psuedo-random number generator. This is not an abstract worry as shown by Plumstead. The problem calls for the use of CSPRB generators. In private key cryptography, one-time pads constitute the best type of cryptosystem. In practice, one-time pads are approximated by pseudo-random number generators [4]. Shamir [29] points out that "unpredictable" pseudo-random number generators may be a valid substitute for one-time pads. Therefore CSPRB generators are particularly good substitutes for one-time pads: two partners who both possess the same CSPRB generator and have secretly exchanged a random seed, are actually sharing a long bit-sequence that can be successfully used as a one-time pad. The fact that the cryptographic strength of CSPRB generators depends only on the secrecy of the seed and not the secrecy of the program, makes them an available tool to mathematically nonsophisticated users. In fact, it is unreasonable to assume that a business person should be able to design a cryptographically strong generator. However, anyone can buy such a program from the public market and secretly select a short random seed.