This one is for the Mathematicians and Geeks


Security of Random Number Generation: An Annotated Bibliography


A few quick notes

1. This is hardly an in-depth look at the entire field of Random Number Generation; instead this annotated bibliography focuses on the security of Pseudorandom Number Generators.
2. The abbreviations PRNG(s) and PRBG(s) are often used to abbreviate Pseudorandom Number Generator(s) and Pseudorandom Bit Generator(s) respectively.
3. Many of the titles of the papers and books included are links to the actual paper or book. If you find the annotation interesting take a look.
4. Many of the papers and books include "warnings" about the familiarity with different subjects that the work assumes. While the "warnings" should serve as general guidelines, please do not allow them to discourage natural curiosity. Often one does not have to understand every detail of a work to understand the main idea.

Introduction and Overview Material

Random Number Generation is an essential part of any secure application. While it is particularly interesting the field can appear somewhat daunting at first. Below is a list of papers and books that should provide a good background for those just starting to research the field.

This paper/introduction to random numbers was originally prepared as an appendix to the P1363 standard. The paper provides an introduction to the need for random numbers, the criterion for random numbers, and natural random sources. Radioactive sources, quantum effects in semiconductors, photon polarization phases, timing between keyboard and mouse activity, and dev/random on UNIX machines are included as unguessable values. On the other hand, Ellsion identifies network statistics, process statistics, and I/O completion timing and statistics as predictable values that are often misused as seeds. The paper also includes advice on what to when designing a random number generator, including: testing for degeneration of the entropy source, mixing entropy sources and feeding all bits into an initial hash function. Using chaos equations, library random functions, and linear congruential generators are all identified as what to avoid when designing a PRNG. The main conclusion of the paper is that good and simple hardware random number generators, like Intel’s, may make software PRNGs obsolete in the future.

[2] T. Matthews, Suggestions for Random Number Generation in Software, RSA Laboratories Bulletin Num. 1, January 1996. This paper focuses on the distinction between random and pseudorandom numbers. It goes on to explain what it means to be a good PRNG and the various uses of PRNGs in applications. The paper concludes that done properly, random number generation in software can provide the security necessary for most cryptographic systems. The two key points for using a good PRNG are good seed material and a good PRNG algorithm. One solution to a lack of easily available random seeds is to save encrypted seed state for use in subsequent sessions of the application. A time may come when software PRNGs are no longer needed due to the increased availability of hardware RNGs. In the meantime, however properly used software PRNGs can be securely used.

[3] RSA Laboratories, How does one find random numbers for keys?, RSA Laboratories: Frequently Asked Questions: Version 4.1, January 2000. This answer provides a brief overview of natural random sources and requirements for PRNGs. The answer introduces readers to uses of random numbers, components of a PRNG and the importance of the seed of a PRNG. It emphasizes the primary difference between pseudorandom numbers and random numbers is that pseudorandom numbers are necessarily periodic whereas truly random numbers are not. Furthermore, it concludes it is not sufficient for a PRNG just to pass a variety of statistical tests, because the output of the generator may still be predictable. "How does one find random numbers for keys?", is not a in-depth treatment of the PRNGs and their security, but it is an excellent executive summary for those looking for a place to begin research.

[4] B. Schneier, Applied Cryptography, John Wiley & Sons, 1996. This book is an excellent reference for cryptography and computer security in general; however several chapters are especially important in terms of random number generation. Chapter 16: "Pseudo-Random-Sequence Generators and Stream Ciphers" discusses the design and implementation of linear congruential generators, feedback shift registers, additive generators and other stream generators. Chapter 17: "Other Stream Ciphers and Real Random-Sequence Generators" covers the basics concepts involved in designing PRNGs. These concepts include entropy estimation, gathering and distilling of entropy pools. This chapter also provides a good reference to choosing which type of PRNG algorithm to use with a particular stream cipher design. The chapters assume a familiarity with cryptography, and the C programming language.

[5] D. Knuth, The Art of Computer Programming Vol. 2, Addison-Wesley, 1981. Chapter 3 within volume 2 of The Art of Programming provides a good introduction into the definition of randomness, the distinction between being random and unpredictable and many uses of random numbers. It also servers as a reference to the properties, advantages and disadvantages of most PRNG algorithms. Furthermore, Knuth concludes that computers must provide better PRNGs. Knuth advises, “…look at the subroutine library of your computer installation, and replace the random number generator with a good one. Try to avoid being too shocked about what you find.”

[6] A. Menezes, P. van Oorschot, and S. Vanstone, Handbook of Applied Cryptography, CRC Press, 1996. Chapter 5: "Pseudorandom Bits and Sequences", is not only a comprehensive introduction into Random Number Generators, but also an excellent reference for techniques and tests of random number generation. The chapter introduces basic concepts such as random bits vs. random numbers, and uniform vs. non-uniform distributions. It provides techniques for generating both random and pseudorandom numbers, including the ANSI X9.17 generator, and FIPS 186 generator. Furthermore, it describes different statistical tests used to measure the quality of a random bit generator. The chapter emphasizes Golomb's randomness postulates, Maurer's universal statistical test, FIPS 140-1 statistical tests for randomness, and other more basic tests (frequency, serial, poker, runs, and autocorrelation). The chapter concludes with a discussion of several examples of cryptographically secure PRNGs including the RSA PRBG, Micali-Schnorr PRBG, and Blum-Blum-Shub PRBG. The chapter assumes a familiarity with cryptography and pseudo-code.

In-depth Look at the Design and Attacks on PRNGs

Presently, many different PRNG designs exist; however very few are secure. These papers look at different insecure PRNG designs and propose secure PRNG designs and guidelines. I consider these papers the most important to examine for anyone beginning to research PRNGs.

[7]D. Eastlake, S. Crocker, J. Schiller, Randomness Recommendations for Security, RFC 1750, December 1994. The paper focuses on the idea that choosing random quantities to foil a determined attacker is surprisingly difficult, but essential for secure applications. The paper examines many pitfalls and problems of PRNGs in practice, and describes several attacks often used to break PRNGs. The authors suggest hardware techniques are a simple way to produce cryptographically strong randomness. Specifically, the quality of the hardware would not need to be high and the existing computer hardware could be used. However, in the absence of hardware sources of randomness software sources can be used with care. Regardless of the method used the paper also concludes that the size of PRNG output needs to be very large for certain applications. This paper assumes a familiarity with High-Level Statistics, Linear Algebra, and pseudo-code. This paper can be confusing to those just starting to research PRNGs, but if there is only time to look at one paper on PRNG Design, this should be it.

[8] P. Gutman, Software Generation of Practically Strong Numbers, 1998 USENIX Security Symposium, revised June 2000. The paper is the author’s response to his own statement, "Although there are a number of papers which give very broad recommendations on random number generation, none seem to give very specific details or real-life analyses of a particular method". The paper presents a significant number of problems and pitfalls in current PRNGs, and offers a comprehensive guide to a practically strong PRNG. The author emphasizes in order to avoid potential security compromises , all input data should be preprocessed, all output should be post processed, the generator should not depend on user input, the randomness pool should always be sufficiently mixed (especially at startup time), and the generator should consider OS-specific booby traps (i.e. fork() in UNIX). Finally, the paper concludes no special hardware or access to privileged system settings are needed to build such a PRNG. Gutman’s guidelines obviously differ from those in Schneier’s PRNG (listed directly below this annotation). However, Gutman does not analyze the PRNG Schneier et al. proposed. The paper assumes a background in reading pseudo-code, and familiarity with Operating Systems.

[9] J. Kelsey, B. Schneier, D. Wagner, and C. Hall, Cryptanalytic Attacks Pseudorandom Number Generators Fast Software Encryption, Fifth International Workshop Proceedings, March 1998. The paper argues that instead of viewing PRNGs as a small piece within a larger cryptographic element; we should think of PRNGs as their own unique cryptographic primitive. The authors propose a model for PRNGs with these guidelines: basing the PRNG on a strong seed, making sure the PRNG state changes over time, "drastically" reseeding the PRNG at random times, and continually recovering for compromises. The paper goes on to discuss the possible attacks against the model, and the results of attacks against other “real-world” PRNGs used in distributed applications. Furthermore, the paper proposes several open problems in areas of PRNG design that the authors have not discussed. While the authors' do not comment on Peter Gutman’s design above, their PRNG design is significantly different. This paper assumes a strong familiarity with cryptography, cryptanalysis, and PRNGs.

[10] R. Rivest, The MD5 Message-Digest Algorithm, RFC 1321, April 1992. This paper presents the MD5 message-digest algorithm. The algorithm takes an input message and produces a "message digest" or "fingerprint" of the input. While MD5 is slower than its predecessor, it uses more conservative standards including: more rounds, less symmetry between rounds, faster avalanche effect, and a distinct shift amount in each round. Implementation of the MD5 algorithm is fairly simple, but it is conjectured that the difficulty of coming up with two messages having the same message digest is on the order of 2^64 operations, and that the difficulty of coming up with any message having a given message digest is on the order of 2^128 operations. Therefore, the author concludes the algorithm exhibits strong collusion, and is cryptographically secure to be used for PRNG and digital signature applications. However, others are encourage to scrutinize the algorithm for weaknesses to contribute to future PRNG and digital signature algorithm design. Overall the MD5 is an excellent example of the implementation of a PRNG algorithm. This paper requires a strong understanding of a high level programming language and cryptography.

Breaking Traditional PRNGs

A Linear Congruential Generator (aka Traditional PRNG) was one of the simplest and earliest used PRNGs. While there is no obvious dependence between the seeds and the outputs, the graph of any sequence will show up as a regular lattice. This lattice structure is very easy for cryptanalysts to exploit.

[11] G. Marsaglia, Random numbers fall mainly in the planes, Natl. Acad. Sci. Prec. vol. 61, September 1968, pp. 25-28.
This paper presents the discovery of what has become known as the Marsaglia Effect. The paper shows that Linear Congruential Generators are very sensitive with respect to the parameters chosen. If the parameters meet certain criteria all the "random" values generated by the Linear Congruential Generator converge to the values within a specific plane. These parameters can be chosen by the user; thus if a user knows values that match the criteria, one can mount a "chosen input attack" This significantly limits the seeds that need to be searched, making a seed search attack very feasible. For those specifically interested in the Marsaglia Effect and even Linear Congruential Generators in general this paper is an essential paper to examine. This paper assumes a strong familiarity with Abstract Algebra, and Linear Algebra.

[12] J. Boyar, Inferring Sequences Produced by Pseudo-Random Number Generators, Journal of the ACM vol. 36, January 1989, pp.129-141.
This paper presents efficient algorithms for inferring sequences produced by single variant linear congruential generators. Access is assumed to all elements of the sequence prior to the element being predicted. Using the general method presented in the paper, the author proves all linear congruential generators of a specific form insecure. The author's method of inference uses a recurrence that relies on being able to compute integer functions of previous PRNG output in polynomial time. This is one the original discoveries of algorithms with the ability to infer linear congruential generators. This paper assumes a very strong mathematics background, including: Linear Algebra, Abstract Algebra, and High-Level Statistics.

[13] H. Krawczyk, How to Predict Congruential Generators, Advances in Cryptology - Crypto 89, 1989, pp. 138-153.
The paper describes a method to predict all congruential PRNGs.  This paper extends the work done by Joan Boyar in "Inferring Sequences Produced by Pseudo-Random Number Generators". Again access is assumed to all elements of the sequence prior to the element being predicted.  The author concludes that both single variant and multivariate polynomial generators are efficiently predictable if certain integer functions can be performed in polynomial time.  The main new conclusion of this paper is the inclusion of multivariate polynomial generators into the class of PRNGs that can be predicted.   This paper assumes a very strong mathematics background, including: Linear Algebra, Abstract Algebra, and High-Level Statistics.

[14] M. Bellare, S. Goldwasser, and D. Micciancio, "Pseudo-Random" Number Generation within Cryptographic Algorithms: the DSS Case, Advances in Cryptology - Crypto 97, 1997. The paper focuses on the DSS signature algorithm, which requires a signer to generate a new random number with every signature.  If the random numbers for DSS are generated using a linear congruential PRNG or a PRNG that uses modular linear equations, the key (and all future outputs) can be recovered after only seeing a few signatures.  This weakness demonstrates that the entire security of the DSS signature algorithm is effectively compromised by the use of a poor underlying RNG process. The paper concludes that linear congruential PRNGs should be used with extreme caution in cryptographic applications.   This paper assumes a very strong mathematics background, including: Linear Algebra, Abstract Algebra, and High-Level Statistics.

Fixing Traditional PRNGs

After examining the previous section it is obvious that there are many problems with traditional PRNGs. Immediately, after these problems where first discovered many researchers focused on fixes with the design of traditional PRNGs, instead of researching a new design entirely. While any traditional PRNG should be used with extreme caution; they can still be useful (especially when strong cryptographic security is not a requirement).

[15] S. Park, and K. Miller, Random Number Generators: Good Ones are Hard to Find, Journal of the ACM vol. 31, October 1988.
The paper proposes to design and implement a random number generator that will be a minimal standard; meaning this generator should always be used unless one has access to a random number generator proven to be better.  The implementation is presented in a high level language that will port to a variety of systems, restricting the use of specific system functions to generate random values.  Instead the paper presents a method to use a larger range of values than the system is capable of representing to choose one’s random value from. The author concludes with an analysis and explanation of inadequate random number generators present in certain applications.  This paper requires a familiarity with Pascal (or any other high-level language) and a strong background in mathematics including: Linear Algebra and Abstract Algebra.

[16] L. Deng, E. George, and Y. Chu, On Improving Pseudo-random Number Generators, Proceedings of the 1991 Winter Simulation Conference, January 1991.
The paper identifies that many traditional PRNGs do not produce random numbers in a uniform distribution.  However, by adding enough random variants, whether or not they are independent, the random variants sum will converge to a uniform distribution. Furthermore, the paper presents empirical studies that combine "bad", non-uniform, random number generators into uniform random number generators. Both these methods focus on asymptotic uniformity, and asymptotic independence, without the usual assumption of independence of the generators. The empirical study concludes that only a small number of terms is required to achieve a much more uniform sequence. This paper assumes a very strong mathematics background, including: Linear Algebra, Abstract Algebra, and High-Level Statistics.

Serious flaws in PRNGs used in "real world" applications

The following papers are detailed examples of PRNG failures in several "real world" applications. The most common problem is lack of proper initialization (or seeding) for the PRNG. However, problems with the PRNG algorithm and the size of the seed are discussed as well.

[17] B. Arkin, S. Marks, F. Hill, M. Schmid, T. Walls, and G. McGraw, How we Learned to Cheat in Online Poker: A Study in Software Security, Reliable Software Technologies, September 1999. The paper describes how the security or "randomness" of an Internet poker room, was completely compromised. The attackers exploited several significant insecurities. The first was a non-uniform shuffling algorithm, that did not generate all 52! possible decks of cards with equal probability. Another insecurity was rooted in the choice of the seed for the PRNG; an inadequately sized and easily predictable system clock. With these two pieces of knowledge, the attackers were able to effectively limit the seedspace, so they could determine with complete certainty the entire composition of a deck of cards, after only seeing 5 cards. The paper concludes when software that needs to be secure misbehaves, "bad things happen". PRNG designers must be extremely careful to use uniform algorithms and unguessable, large seeds. This paper assumes familiarity with pseudo-code.

[18] I. Goldberg, and D. Wagner, Randomness and the Netscape Browser, Dr. Dobb's Journal, January 1996. Netscape’s Web Browser supports the Secure Sockets Layer (SSL), a cryptographic protocol developed by Netscape to provide secure Internet transactions.  The paper discovers serious flaws in the implementation of SSL, allowing an eavesdropper to decode the encrypted communications. Using easily predictable seeds like process ids and the time of day, allow attackers to narrow down the key space enough to make a search of all possible seeds very feasible.  Netscape would not release the implementation of the PRNG algorithm. Consequently, the PRNG algorithm and it's seed choices were only reviewed by Netscape, no one else. The authors emphasize the necessity for more secure seeds to be chosen, and the need for peer review in the development of any secure software.

[19] T. Roessler, Catastrophic Failure of Strip Password Generation, Bug Traq at SecurityFocus.com, April 2001. Strip (Secure Tool for Recalling Important Passwords) is an encrypted password notebook for Palm. It features functionality for generating and storing/recalling passwords secure passwords.  However, this function has several significant flaws including a linear congruential generator as the PRNG, poor seeds for the PRNG based on clock ticks, and an implementation bug, that further limits the range of the PRNG seed. An attacker could easily limit the seedspace to 2^8, an extremely trivial size, by combining these weaknesses. The author concludes that Strip is a good example of what not to do; a poor PRNG algorithm with poor seeds and poor seed size. The paper also adds, that that many more Palm applications may have similar problems due to the small size of physical memory, and lack of good, unguessable, seeds. The paper assumes familiarity with the C programming language.
[20] Bryn Dole, Steve Lodin, and E. H. Spafford, Misplaced trust: Kerberos 4 Session Keys, 1997 Symposium on Network and Distributed Systems Security 1997. One of the commonly accepted principles of software design for security is that making the source code openly available leads to better security. The presumption is that the open publications of source code will lead others to review the code for errors, however, this openness is no guarantee of correctness. One of the most widely published and used pieces of security software in recent memory is the MIT implementation of the Kerberos authentication protocol. In the design of the protocol, random session keys are the basis for establishing the authenticity of sevice requests. Because of the way that the Kerberos Version 4 implementation selected its random keys, the secret keys could easily by guessed in a matter of seconds. This paper discusses the difficulty of generating good random numbers, the mistakes that were made in implementing Kerberos Version 4, and the breakdown of software engineering that allowed this flaw to remain unfixed for ten years.

Help for using and designing PRNGs

There are a lot of problems one can run into when using or designing a PRNG. The following papers offer some new tools, reactions to failures, and good advice about how to avoid common PRNG problems.
[21] R. Baldwin, Preliminary Analysis of the BSAFE 3.x Pseudorandom Number Generators, RSA Laboratories Bulletin Num. 8, September 1998. The paper presents the algorithms used in the BSAFE and JSAFE pseudorandom number generators.  The author presents their design and explains why these PRNGs are believed to be cryptographically strong.  This discussion highlights the assumptions that are made about the underlying hash functions to ensure good cryptographic properties for the PRNGs.  Furthermore, the paper examines how the PRNGs resist various attacks. This examination allows the author to explore the implications of the design choices made in building BSAFE and JSAFE. Finally, the statistical properties of the PRNGs are examined based on classical randomness and Marsaglia Diehard tests.  The paper concludes that these are good algorithms for generating cryptographically strong pseudorandom numbers.

[22] S. Pincus, and R. Kalman, Not all (possibly) "random" sequences are created equal, Proc. Natl. Acad. Sci. USA Vol. 94, April 1997. The paper focuses on the need to assess the randomness of a single finite sequence.  The paper presents the utility of being able to assess this condition using the notion of sequential irregularity (aka approximate entropy); which can be evaluated.  It concludes that sequential irregularity refines randomness and normality. The basis of the theory is both process independence in classical probability theory and normality reduce to a binary, YES/NO determination of whether a function will converge to 0. This explicitly characterizes asymptotic behavior of sequence variation. The authors believe that by determining the randomness of a sequence PRNG designers, will be able to better assess the randomness of PRNG output.  It also poses several unanswered questions concerning the approximate entropy within infinite sequences. This paper assumes an expert understanding of Statistics and Statistical Theory.

[22] PokerStars.com, The Integrity of our Games, PokerStars.com, January 2000. This description of security at an online card room was motivated by “How we Learned to Cheat in Online Poker: A Study in Software Security”.  The description emphasizes the use of a shuffling algorithm with a uniform distribution, one that generates all 52! shuffles only once. Furthermore, the site uses an Intel RNG and user input to supply entropy, and the SHA-1 cryptographic hash algorithm to mix the entropy gathered from both sources.  Finally, the site maintains a SHA-1 based PRNG to provide even more security and protection from user data attacks. This description is a very interesting; it clearly shows that at least one gambling site, received a stern wake-up call after Reliable Software Technologies published their “break-in”. The description assumes a familiarity with pseudo-code.

[24] G. McGraw, and J. Viega, Make you software behave: Playing the numbers, IBM Developer Works, August 2000. This paper serves as a good introduction to the uses of a PRNG and how a PRNG works.  After establishing this, the authors go on to give an in-depth treatment of common PRNG pitfalls and many of the “bugs” that have plagued applications that use bad PRNGs.  The paper focuses on methods to break traditional PRNGs, non-uniform PRNG algorithms, poor choices for seeds of PRNGs, and small keyspaces or “seedspaces” for PRNGs.  The paper emphasizes that if a PRNG falls victim to any one of these “bugs” the security of the entire application is in jeopardy.  Furthermore, it concludes every PRNG has its comparative advantages and disadvantages, the user must decide which PRNG is appropriate for their application.  No PRNG algorithm is appropriate for all tasks.

[25] G. McGraw, and J. Viega, Make your software behave: Software strategies, IBM Developer Works, September 2000. This paper is a software answer to many of the concerns posed in the previous paper. Initially, this paper emphasizes that a hardware random number generator should always be used over software.  However, in the absence of hardware the paper provides several suggestions for PRNGs.  These suggestions include using the Operating System (only if it is UNIX or a flavor of LINUX), the Java programming language, and/or C/C++. Finally, the paper offers a look at several even better PRNGs including MD5, and Yarrow.  The paper concludes that if you need cryptographically strong security, hardware random number generation is your best option.   However, in its absence a good PRNG like MD5 or Yarrow can be adequate.  Operating Systems, Java, and C/C++ random number generation should only be used if a generally unpredictable random number is needed, (i.e. a nonce).

[26] R. Baldwin, Proper Initialization for the BSAFE Random Number Generator, RSA Laboratories Bulletin Num. 8, January 1996. This bulletin advises PRNG users how to avoid a poor technique of seeding the random number generators, BSAFE 1.x, and BSAFE 2.x. While the bulletin focuses on these two specific PRNGs, the methods of fixing this common problem can applied to other PRNG algorithms. The seeding problem is that each call to the seeding function assumes a substantial amount of unpredictability, when in reality, there it is only receiving one byte of unpredictability. One method to fix this would be to make many more calls to seeding function, thus getting a cumulative number of unpredictable bytes. However, an alternative is to include a counter with every byte of the seed. The author concludes that for many applications the second approach extracts nearly as much randomness as the previous approach and requires much less time. This paper assumes a strong familiarity with general PRNG design.

Source: http://www.math.columbia.edu/~woit/wordpress/

Comments

Popular Posts