Random Number Generators

Generation of Random Numbers

Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.
John von Neumann (1951)

Up to now we have not discussed where the ``random numbers'' come from or indeed what we mean operationally by a random number. First randomness is a property of an infinite sequence x_i with i = 1, 2, . . . . It is impossible to tell whether a single number is random or not. Random is a negative property, it is the absence of any order.

Note that there are three types of so-called random numbers.

Truly Random - numbers exhibiting "true" randomness, such as found by counts of a Geiger measuring radioactive decay.
Pseudo-Random - PR numbers (PRN) having the appearance of randomness, nevertheless the numbers exhibit a repeatable pattern or period of re-occurance (Hopefully very, very long because for 1024x1024 sites on 2-D lattice using 10⁷ MC sweeps, one needs 10¹³ RNG!).
Quasi-Random - numbers that are not random at all really (in fact, Quasi-random is not really a proper name). QR numbers are chosen to fill solution space with maximum distances between points so that "holes" are filled sequentially. An example would be numbers from [0,10], where the numbers could be chosen in any order and all space would be covered with no repeats. Using this with MC techniques for integration can improve variance to 1/N, such as for simple definite integrals. (Does not do so for all.)

In this section we will only discuss the uniform distribution, x, is either uniformly distributed in the range (0, 1) or x is an integer in [1, 2^M]. (M is the number of bits). For simplicity, assume the latter representation. Then we can say the sequence is not random if knowing the previous values, allows you to predict something about the next value, (other than it is one of the allowed integers). With a random sequence, the predictions of even a very clever person are not better than average.

What are some of the desired properties of random numbers?
Uncorrelated, deterministic, long-periods between repeats, uniform converage of range, efficiency of algorithm.

Determinism. This is an odd property for a random sequence to have, since if it is deterministic, one can predict with certainty the next number, if you know the algorithm. However, it is a useful property for debugging since often one needs to check the code. Computers as they are now constructed are deterministic. The only possibilities for randomness are to read an "external" device, say the microsecond clock or a very unreliable memory. This is not done because it is very much too slow and too expensive. Everyone uses pseudo-random number generators (PRNG) instead, perhaps seeding (or initializing) them with the clock. It is likely that the PRNG are as reliable as a hardware version of a random number generator. A perfect pseudo-random number generator appears to be perfectly random, unless you happen to know both the algorithm and its internal state. From now on, we only discuss pseudo-random numbers. Let s be the internal state of the prng. Then a PRNG is a mapping from one state to another: s_n+1 = f(s_n). The n^th random number is then some mapping of the internal state onto the integers or reals.
Cycle length. Any pseudorandom number generator eventually has to repeat or cycle since there are a finite number of internal states. One would like the cycle length to be longer than a typical run. Consider the worst case scenerio, where all a code is doing is generating PRNs. A fast computer now can generate 10⁹ operations/second. If we run it for 1 week we can generate 10¹⁵ = 2⁴⁹ PRNs. Hence if the PRNG has the maximal cycle length we should have at least 49 bits in the internal state. The continuing performance increase in computer performance, means that methods that worked in the past, may fail with tomorrow's computers. Luckily, it is easy to make very long cycle lengths.
Uniformity. This property means that if we go through a complete cycle, all the integers occur once. If we make a test with N occurrences of x, the number in the interval (a, b) should be in the range N (b-a) ± ( N (b-a) )^-1/2. (This assumes that N ( b - a ) is much greater than 1.) Usually uniformity is easy to guarantee if you go over an entire cycle.
Correlation. We would like our random numbers to be completely uncorrelated with each other. In general this means that any average such as:

< f₁(x_i+1) f₂(x_i+2) . . . f_k(x_i+k) > = < f₁(x_i+1) > < f₂(x_i+2) > . . . < f_k(x_i+k) >

or for any set of functions f. The average is over all internal states of the random number generator. It turns out that Monte Carlo simulations of statistical mechanics can be sensitive to very long range correlations. Which particular correlations they are sensitive to is not very obvious, so often the only real test to run the code for a case where the exact answer is know or alternatively to try several different PRNGs.

If all you are doing is one dimensional integrals, one doesn't have to worry about correlation.

Quasi random number generators give up correlation, so the sequence will be more uniform than a random sequence would be, and thereby converge to the mean value quicker than N^-1/2. (reference Physics Today)

Based on desirables, what should the zeroth, first and second moments of a good UNIFORM RNG be on the range [0,1)? That is, what is average <R_n> and the variance <(R_n - <R>)²>?

Types of Pseudo-random number generators (PRNG)

Some of the common types of random number generators are:

Linear Congruential Generator (LCG): This is what is provided on most computers

I_n+1 = ( aI_n + c ) mod m

The initial (or seed) value to start the PRNG sequence is denoted I₀. "mod" denotes modulo, or the remainder after division by m. Prime modulo are the best but 2^N work fastest on binary computers. If c not zero, then LCG is called Mixed LCG.

Real Pseudo-Random number is then obtained by:

R_n = I_n/ float(m) for [0,1)

R_n = I_n/ float(m-1) for [0,1]

For c=0, the maximum cycle length m= 2^M occurs when a = 1 mod 4 and c is odd (often chosen as 1). For c not zero, the maximum cycle length= 2^M-2 occurs when a = 3 mod 8 or 5 mod 8 (which is prefered) and initial seed is odd. (Note that low-order bits are not random!) For m = P (prime number), P-1 is the maximum period IFF a = primitive element mod P. There is always one integer that maps to itself, e.g. for c=0, I₀ = 0 maps to itself.

Park and Miller recommend the following portable generator: m=p= 2³¹ - 1 (i.e., 2,147,483,647), with a=16,807 and c=0. It is found that the larger M is the better the resulting sequence. Some 48 bit generators are very good. 64 bit are even better.

In general, never use the lower order bits from this type of generator as they are not very random.

Beware of using large groups of PRNs together (i.e. n-tuples). The congruential PRNG have the disadvantage that they are not free of sequential correlations on successive calls. They fall into planes. If k random numbers at a time are used to plot points in k-dimensional space, then the points lie on k-1 dimensional planes rather than filling up all k-space. There are at most m^1/k planes, or fewer (which is even worse) if a, c, or M are not well chosen. If you take k numbers together, the correlation can be as high as 2^-M/k.

Tausworth Generator.

I_i = I_{i - 250} XOR I_i-147

where XOR indicates a bitwise exclusive-or operation. This is very fast on the computer. It also has a very long internal state (the last 250 integers). Hence the cycle length is very long. It must be seeded correctly otherwise it can get stuck.

Lagged Fibonacci generator. I_i = ( I_{i - 250} + I_i-147) mod m
Recall Fibinnacci sequence for positive integers: I_i+2 = ( I_i+1 + I_i )
Initial seed is I₀ = 0 and I₁ = 1 . Sequence is 0, 1, 1, 2, 3, 5, 8, 13, 21, ... and defines a geomteric progression found in nature, such as growth of spirals in shells, flower petal counts, etc.
This differs from the Tausworth generator because it is regular addition (modulo m, with m = 2^M usually), although can use multiplication and exclusive or as well. The major drawback is the greater storage requirement of L₁-1 random numbers to play initial seed (where previous by L₁ > L₂ lags are 250 and 147 in above eq.)

Seeds for PRNG

For debugging, as stated in class, the seed can be set to a known constant (such as a large prime (odd number)) for reproducibility of PRN sequence. Otherwise the initial seed of the RNG should be a random odd number. S.L. Anderson [``Random Number Generators on Vector Supercomputers and Other Advanced Architectures,'' SIAM Review, 32, 221-251 (1990)]. recommends setting initial seed to integer: I_o= iyr + 100*(imonth-1 + 12*(iday-1 + 31*(ihour + 24*(imin + 60*isec) ) ) ) where the variables are integer year (0-99), month, day, and time. To introduce maximum variation in the seed into the least significant digit, it has been suggested to invert the order of the variables (see Computational Science & Engineering Project) to the "second of the century" by I_o= isec + 100*(imin + 60*(ihr + 24*(iday-1 + 31*(imon -1 + 12*iyr) ) ) ) and make I_o = ior(I_o,1) (in Fortran) to make sure it is odd!

Overflows and Negative Integers must be dealt with carefully. Recall that a (and m) is set large to provide sufficient randomness (and long periods). But, if two 31-bit numbers are multiplied, a 62-bit number is generated (an overflow) and the 32-bit (which is the sign bit for the mantissa) can be set (a negative number). Negative numbers then occur about half the time. One method is to use a bit mask to handle the most significant 32-bit and set it to `0'. A logical AND operation can be performed on the random, negative integer by using the bit mask imask=z'7fffffff' (which is zero followed by 31 ones in base-2). In Fortran, this is done by I_n = iand(I_n, imask). But masking only works when m is a powers of two, or else iand destroys sequence. In C, one can use unsigned integer and avoid the sign bit problem altogether.

Parallel PRNG

All of the above generators can be generalized to parallel computation. That presents some interesting issues to ensure that each processor has a completely independent generator. We have developed SPRNG a parallel random number library.

Testing Generators

Statisticians have invented many tests to tell if sequences are random such as those given by Knuth.. The most complete test is to examine how uniform n-tuples (or vectors) are.You will do this in the homework exercise. If the generator is random, equal numbers of vectors will be in each bin within statistical fluctuations. These tests of random number generators (RNG) are not adequate in determining completely the usefulness of a particular RNG because they only test numbers 5-10 at a time. But much longer sequences can be important in applications. Thus the construction and testing of prng is an art and not a science.The best test is to use several different generators on your application and see if the results are independent of the generator.But for these to be a good test, your runs have to be as long as the production runs. (Why?) SPRNG allows you to switch the generator just by changing a link argument in the Makefile.

See other on-line material:

Random Numbers on the Web

See Tutorial on RNG

References:

D. Knuth, Art of Computer Programming, Vol2. Seminumerical Algorithms, Second edition. Addison-Wesley, Reading, Massachusetts, 1981.

W.H. Press and S.A. Teukolsky, Numerical Recipes

Portable PRNG: George Marsaglia and Arif Zaman, Comput. Phys. 8, 117 (1994)
(From Physics Resource Link download Computer in Physics Articles)

P. Coddington, Tests of random number generators using Ising model simulations . Int. J. of Mod. Phys. C, 7(3):295-303, 1996.

Srinivasan, A., D. M. Ceperley and M. Mascagni, "Random Number Generators for Parallel Applications" in Monte Carlo Methods in Chemical Physics, Vol. 105, Advances in Chemical Physics, eds. D. M. Ferguson, J. I. Siepmann and D. G. Truhlar,Wiley (1999).