What is a quasi-random number generator

The Monte Carlo method with pseudo and quasi-random numbers

Transcript

1 Advanced Seminar Methods of Experimental Particle Physics WS 2011/2012 The Monte Carlo Method with Pseudo and Quasi-Random Numbers Marco A. Harrendorf Karlsruhe Institute of Technology, Bachelor Physics Lecture given on

2 Contents 1 Introduction Definition Application areas Special features of the Monte Carlo method Probability theory Probability, probability distribution, expected value and dispersion of a discrete random variable Normally distributed random variables The central limit theorem of probability calculus The law of large numbers Generating random numbers 7 4 Pseudo random numbers 7 5 Characteristics of good random number generators 8 6 Quasi-random numbers Approximation of an integral using the Monte Carlo method Properties of quasi-random numbers Literature 10 2

3 1 Introduction 1.1 Definition The Monte Carlo method is a numerical method for solving mathematical problems with the help of the modeling of random variables [5]. In the Monte Carlo method, random numbers are used to approximately describe and calculate mathematical and physical problems based on statistical probability distributions and processes. The Monte Carlo method is mentioned for the first time in 1949 in the theoretical treatise The Monte Carlo method by Metropolis and Ulam [3]. The Monte Carlo method was used earlier: Georges-Louis Leclerc used it in 1727 to determine the circle number π. 1.2 Areas of application The Monte Carlo method essentially comprises two sub-areas: The integration The simulation Both areas have in common that the problem is solved numerically with the help of the Monte Carlo method and thus the result is always associated with an uncertainty can be reduced by increasing the statistical sample. The Monte Carlo method is used, for example, in the following scientific sub-areas: Particle and neutron physics: Large parts of particle and neutron physics would be unthinkable without Monte Carlo simulations. On the one hand, Monte Carlo simulations are used in this area to predict statistical measurement results and, on the other hand, they are used to estimate quantities in a measurement scenario that cannot be determined using direct measurements. Service theory: In service theory, Monte Carlo simulations are used to deal with queue problems. For example, it is possible to predict the utilization of a telephone hotline with the help of the Monte Carlo method based on collected statistical data as a function of the number of telephone lines. Game theory: In game theory, the Monte Carlo method can be used, among other things, to simulate the action of individual individuals in an interdependent decision-making situation and thus predict the outcome of a decision-making situation. 3

4 Econometrics: The Monte Carlo method is an important tool for econometrics, as it allows statements about the future, e.g. future price changes on the stock exchange. Solid-state physics: In solid-state physics, the Monte Carlo method is well suited for solving physical problems that cannot be solved in a closed manner, but require a numerical approach, unless more suitable (more precise) solution methods exist. Apart from the areas mentioned above, Monte Carlo simulations are used in many other areas and the Monte Carlo method is used to solve numerical integrals. 1.3 Special features of the Monte Carlo method The Monte Carlo method has the following special features compared to other numerical methods: Versatile applicability: The Monte Carlo method is suitable for all modelable processes, the course of which is determined by random factors. It is also suitable for all mathematical tasks that can be described by an artificial probabilistic model. Simple structure of the calculation algorithm: The Monte Carlo method is usually relatively easy to implement for a given problem, whereby it is usually possible to start with a simulation with a relatively coarse simulation model and then successively according to the requirements of the simulation accuracy to expand. The calculation algorithm for all Monte Carlo simulations is divided into the following abstract sub-steps: Realization of a random experiment Repetition of the experiment several times with different random numbers Statistical evaluation of the experiment Dependency of the calculation accuracy on the number of repetitions: The accuracy of the result in the Monte Carlo Carlo method increases with the number of realized random attempts and can be estimated for pseudo-random numbers using the law of large numbers and for quasi-random numbers using the Koksma-Hlawka inequality (more on this later). 4th

5 2 Probability theory 2.1 Probability, probability distribution, expected value and dispersion of a discrete random variable Let ξ be a discrete random variable and xi a discrete value with an assigned probability density pi, the probability distribution of the random variable ξ is then given by: ξ = (x 1 x 2 ... xn) p 1 p 2 ... pn The probability of getting the numerical value of xi as a random variable ξ is then as follows: P {ξ = xi} = pi The expected value Eξ of the random variable ξ can then be determined as follows: n Eξ = xipii = 1 The dispersion Dξ of the random variable ξ is: Dξ = E ((ξ Eξ) 2) 2.2 Normally distributed random variables Dξ = E (ξ 2) (Eξ) 2 Compared to a discrete random variable, a normally distributed random variable ξ is based on the entire set of real numbers (ξ (,)) and the distribution density p (x) is as follows: p (x) = 1 (xa) 2 e 2σ 2 2πσ Eξ = a, Dξ = σ 2 From the knowledge of the properties of a normal distribution The following dependencies are known for the expected value Eξ and the dispersion Dξ of the random variable: Eξ = a Dξ = σ 2 The probability that the random variable ξ lies in the interval (x, x) is then calculated over the probability interval as P {x < ξ

6 2.3 The central limit theorem of the probability calculation If one considers N equally distributed and independent random variables ξ 1, ξ 2, ..., ξ N, whose probability densities pi (x) match, then they each have the same expected value and the same dispersion: Eξ 1 = Eξ 2 = ... = Eξ N = m Dξ 1 = Dξ 2 = ... = Dξ N = b 2 For a sum ϱ N formed from these N equally distributed random variables ξ 1, ξ 2, ..., ξ N then ϱ N = ξ 1 + ξ ξ N and its expected value Eϱ N and dispersion Dϱ N is given as follows: Eϱ N = E (ξ 1 + ξ ξ N) = Nm Dϱ N = D (ξ 1 + ξ ξ N) = Nb 2 If one now considers a specially selected, normally distributed random variable χ with the parameters a = Nm, σ 2 = Nb 2, the central limit theorem claims that the probability that the sum ϱ N is in the interval (x, x), just corresponds to the probability integral of the specially selected random variable χ. P {x <ϱ N

7 Limit value of the probability calculation is asymptotically normally distributed and for the parameters of the normal distribution Eϱ N = a = Nm, Dϱ N = σ 2 = Nb 2. Equation 1 then gives the probability that the sum ϱ with a probability of 3σ im Interval (a 3σ, a + 3σ) is as follows: P {(a 3σ) <ϱ N <(a + 3σ)} {P (Nm 3b N) <ϱ N <(Nm + 3b} N) {P (m 3b) <ϱ} N 3b <(m +) NNNP 1 N ξ jm N <3b N j = 1 The last equation shows that with a given probability (here: 3σ) the spread of the random variables ξ i around the expected value m is of the order O (1 N). This means that the statistical uncertainty in the Monte Carlo method decreases approximately with N. For example, if one wants to reduce the statistical uncertainty of a random sum ϱ N by a factor of 2, then one has to increase the number of realized random variables ξ i by a factor of 4. 3 Generation of random numbers The following three methods are mainly used to generate random numbers. Use of random numbers from tables Generation of random numbers using random number generators Calculation of pseudo-random numbers and quasi-random numbers using computer algorithms 4 Pseudo-random numbers Today, pseudo-random numbers have completely replaced tables and random number generators, as they offer a number of advantages: Simple implementation: There is an existing use of pseudo-random numbers Series of program libraries that calculate pseudo-random numbers using the congruence method and provide them in the form of pseudo-random number generators (see, for example, [1]). 7th

8 Continuability: If the last used seed value of a pseudo-random number generator is known, the generation of random numbers can be continued by determining the next random number directly from the last seed value. Versatile applicability: With the help of the pseudo-random number generators and subsequent transformation of the random numbers, random variables that obey a wide variety of distribution functions and follow the law of large numbers can be generated, so that the pseudo-random numbers can be used in many applications. Known quality: For a pseudo-random number generator it is sufficient if its calculation algorithm is examined once for the random distribution of the generated random numbers, since the quality of the generated random numbers is then also known for all subsequent calculations. 5 Characteristics of good random number generators It is essential for the quality of a random number generator on the one hand that the aperiodicity length, i.e. the length after which the generated random numbers repeat, is as large as possible, and on the other hand that the generated random numbers are randomly distributed. To check the quality of pseudo-random number generators, a number of statistical tests are used that examine individual aspects of the distribution. Nowadays the so-called DIEHARD test suite [2] is usually used to test the pseudo-random number generators. This now includes more than 13 different statistical tests. Well-known random number generators that meet all DIEHARD tests and are characterized by a large aperiodicity length are the RANLUX and RANECU random number generators. Both are implemented in the already mentioned C ++ library CLHEP [1] and can therefore be easily integrated into your own projects. 6 Quasi-random numbers Quasi-random numbers are used to approximate an integral using the Monte Carlo method. 6.1 Approximation of an integral using the Monte Carlo method When approximating an integral [x, x] df (u) du of the function f (u) using the Monte Carlo method, in the interval [x, x] N Support points y N selected at random, at which the function value f (y N) is then calculated. 8th

9 The numerical value for the integral then results from the summation of the individual calculated function values: f (u) du = 1 N f (y N) [x, x] d N 6.2 Properties of quasi-random numbers To calculate quasi-random numbers, deterministic number sequences ( so-called lowdiscrepancy number sequences) which correspond to a sequence of points (Q 1, Q 2, ..., QN). The dimension k and the number N of points are given by the sequence of numbers. The advantage of using these number sequences is that all points in the k-dimensional unit cube are evenly distributed, provided that all N points of the number sequence are taken into account. Compared to the pseudo-random numbers, a uniform distribution of the points in the k-dimensional unit cube is achieved not only for N, but already with the number of points N specified by the sequence of numbers. While the pseudo-random numbers follow the law of large numbers, the statistical uncertainty in the quasi-random numbers must therefore be estimated using the Koksma-Hlawa inequality: 1 N f (y N) f (u) du N [x, x] d V (f ) d N n = 1 n = 1 According to this inequality, the statistical uncertainty is determined by the variation V (f) of the function and the discrepancy DN of the sequence of numbers. The discrepancy D N of the low-discrepancy number sequences is of the following order: () (log N) k D N O N The statistical uncertainty thus decreases approximately with N when using quasi-random numbers. In comparison, the statistical uncertainty for the pseudo-random numbers only decreases with N. For this reason, quasi-random numbers are to be preferred over pseudo-random numbers. However, they also have a couple of serious disadvantages: Fixed dimension k: The dimension of the points (Q 1, Q 2, ..., QN) is fixed from the start; accordingly, all random variables must also be of this dimension in the application, otherwise the equal distribution is no longer guaranteed. However, this is the case in very few applications. Fixed number of points N: The number of points N, which lead to an even distribution of the points in the k-dimensional unit cube, is determined by the sequence of numbers used. A subsequent increase in the number of points and thus a better approximation is not possible. Instead, the calculations may have to be carried out again with a new sequence of numbers. 9

10 Literature [1] Boudreau J. et. al .: CLHEP A Class Library for High Energy Physics, retrieval date: [2] Marsaglia G .: The Marsaglia Random Number CDROM including the Diehard Battery of Tests of Randomness, retrieval date: [3] Metropolis N., Ulam S .: The Monte Carlo method, J. Amer. statistical assoc. 44 (1949), [4] RAND Corporation: A million random digits with normal deviates, Glencoe 1955 [5] Sobol I.M .: The Monte Carlo Method, Berlin