* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download s05.pdf
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Misuse of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Central limit theorem wikipedia , lookup
Session 5 Generation of Random Numbers and Random Variates Ernesto Gutierrez-Miravete Spring 2002 1 Generation of Random Numbers and Pseudo-Random Numbers Recall that for a random variable X which is uniformly distributed in [0; 1] the uniform probability density function is f (x) = ( 1; 0 x 1 0; otherwise while its cummulative distribution function is 8 > < 0; x < 0 F (x) = > x; 0 x < 1 : 1; x 1 A random number (RN) stream is a collection of uniformly distributed random variables. A truly random stream of numbers has the following characteristics: Uniformly distributed. Continuous-valued. E (R) = 21 . 2 = 121 . No autocorrelation between numbers. No runs. 1 In practice one always works with streams of pseudo random numbers (PRN). These have approximately the same characteristics as RN's. PRN's are generated with a computer using a numerical algorithm embedded in a computer program or routine. The requirements of a good PRNG routine are: 1.1 Fast. Portable. Long Cycle. Replicability. Produce PRN with the desired characteristics. The Linear Congruential Method The established algorithm for PRN generation is the linear congruential method (LCM). More sophisticated approaches still use as foundation this method. The fundamental relationship of the LCM is Xi+1 = (aXi + c)mod (m) This means that the value of Xi+1 is the remainder left from integer division of aXi + c by m. Note that the values obtained form the LCM are from the set I = f0; 1=m; 2=m; :::; (m 1)=mg. One key feature of the method is its period (P ) (the number of numbers that can be generated before the same number appears twice). The period is related to the values of m and c as follows: 1.2 If m = 2b and jcj > 0, P = m = 2b . If m = 2b and c = 0, P = m=4 = 2b 2 . If m = prime and c = 0, P = m 1 = 2b 1. The Combined Linear Congruential Method Large simulations require large collections of PRNs and there is a need for still longer periods. These can be obtained by the use of combined linear congruental methods (CLCM). The fundamental theorem associated with CLCM is L'Ecouyer's. If W i; 1; Wi;2 ; :::; Wi;k are independent discrete-valued random variables with al least one of them (say Wi;1 ) being uniformly distributed between 0 and m1 2. then 2 Wi = ( k X j =1 Wi;j )mod (m1 1) is a uniformly distributed RV between 0 and m1 2. More specically, consider the following algorithm k X Xi = ( ( 1)j 1Xi;j )mod (m1 1) j =1 where the Xi;j are LC and with Ri = ( Xi m1 ; m1 1 ; m1 Xi > 0 Xi = 0 It can be shown that the maximum period obtained with this algorithm is P= (m1 1)(m2 1):::(mk 2k 1 1) Example. L'Ecuyer proposed the following CLCM: X1;j +1 = 40014X1;j mod (2147483563) X2;j +1 = 40692X2;j mod (2147483399) produce the combined PRNG Xj +1 = (X1;j +1 to yield Rj +1 = 2 ( X1;j +1)mod (2147483562) Xj +1 2147483563 2147483562 2147483563 ; Xj +1 > 0 ; Xj +1 = 0 Tests for Random Numbers . Since one always works in practice with PRN streams it is necessary to check how close are their characteristics to those of real RN streams. Assume a stream containing N PRN's has been produced. To verify their characteristics the stream is subjected to various tests. In all cases, one states a hypothesis about a given characteristic of the stream and then accepts it or rejects it with a given level of signicance where = P (rejectingH0 jH0 is true) 3 (i.e. Type I error). In testing for uniformity The null hypothesis H0 is Ri 2 U [0; 1] while the alternative hypothesis H1 is Ri 2= U [0; 1] In testing for independence The null hypothesis H0 is Ri 2 independent while the alternative hypothesis H1 is Ri 2= independent 2.1 Kolmogorov-Smirnov Frequency test For this test the numbers are rst arranged in increasing order R1 < R2 < ::: < RN The test makes use of the new variables D+ = max ( i N D = max (Ri Ri ) i 1 ) N and D = max (D+ ; D ) Once D has been computed, a critical value Dc is obtained from the K-S statistical table for the desired and the given N . Finally If D > Dc , H0 is rejected (H1 is accepted). If D Dc , H0 is not rejected (i.e. the numbers are uniformly distributed). 4 2.2 Chi-square Frequency test In this test the numbers are arrenged into n classes by subdividing the range [0; 1] into n subintervals and determining how many of the numbers eand up in each class i, (Oi ). The test uses the statistic 20 = n (O 2 X i Ei ) Ei i=1 where Ei = N=n are the expected numbers of numbers in each class for a uniform distribution. Once 20 has been computed, a critical value 2;n 1 is obtained from the Chi-square statistical table. Finally 2.3 If 20 > 2;n 1 , H0 is rejected (H1 is accepted). If 20 2;n 1 , H0 is not rejected (i.e. the numbers are uniformly distributed). Runs Test This test aims to detect whether there are patterns in substrings of the stream. One examines the stream and checks whether each number is followed by a larger (+) or a smaller ( ) number. Runs are the resulting patterns of +'s and 's. In a truly random sequence the mean and variance of the number of up and down runs a are given by a = 2N 3 1 and a2 = 16N 29 90 When N > 20 the distribution of a is close to normal so the test statistic is Z0 = a a a which has the normal distribution of mean zero and unit standard deviation (N (0; 1)). Once Z0 has been computed a critical value z=2 is obtained from the normal statistical table. Finally If Z0 < z=2 or Z0 > z=2, H0 is rejected (H1 is accepted). If z=2 Z0 z=2, H0 is not rejected (i.e. the numbers are independent). 5 Other types of runs tests are also possible, for instance runs above and below the mean and run lengths. For runs above and below the mean a test similar to the one above is used but with the values of mean and variance for the number of runs b b = 2n1 n2 N + 1 2 and 2n1 n2 (2n1 n2 N ) N 2 (N 1) b2 = where n1 and n2 are, respectively, the numbers of runs above and below the mean. For run lengths one uses the Chi square test to compare the observed number of runs of given lengths against the expected number obtained in a truly independent stream. 2.4 Autocorrelation Test This test aims to detect correlation among numbers in the stream separated by specic number of numbers (lag). Consider the aurocorrelation test for a lag m. One investigates then the behavior of numbers Ri and Ri+jm . If the autocorrelation im > 0 there is positive correlation (i.e. high numbers follow hign numbers and viceversa) and if im < 0 one has negative correlation. The autocorrelation is estimated by 0im = M 1 X [ Ri+km Ri+(k+1)m] M + 1 k=0 0:25 where M is the largest integer satisfying i + (M + 1)m N . The test statistic is in this case given by Z0 = where im = 0 0im im 0 p 13M + 7 12(M + 1) Once Z0 has been computed a critical value z=2 is obtained from the normal statistical table. Finally If Z0 < z=2 or Z0 > z=2, H0 is rejected (H1 is accepted). If z=2 Z0 z=2, H0 is not rejected (i.e. the numbers are independent). 6 2.5 Gap Test This test checks for independence by tracking down the pattern of gaps between a given digit in the stream. The test is performed using the Kolmogorov-Smirnov scheme. 2.6 Poker Test This test checks for independence based on the repetition of certain digits in the sequence. The test is performed using the Chi-square scheme. 3 Generation of Random Variates Discrete event simulation models require as inputs the values of random variables with specied probability distributions. Such random variables are called random variates. Input data for DES models are collected from the eld and/or produced from best available estimates. However, the amount of data collected is rarely enough to run simulation models and one must use the data to create PRN streams with statistical characteristics similar to those of the original data. So, on the one hand one needs to identify the statistical characteristics of the original data and on the other one must be able to produce large collections of random variates with statistical characteristics similar to those of the original data. Here we focus on the second aspect, namely once we have determined the probability distribution applicable to our data we proceed to generate random variate streams for use in the simulation. This is accomplished by the inverse transform method. 3.1 The Inverse Transform Method Given a random (or pseudo-random) number R and a random variate X , Determine the cummulative distribution function of X , F (X ). Set F (X ) = R. Solve the equation F (X ) = R for X in terms of R, i.e. X = F 1 (R). Repeat the above for the stream of random (or pseudo-random) numbers R1 ; R2 ; :::; Rn to obtain the stream of random variates X1 ; X2 ; :::; Xn . Next, the formulae obtained by the inverse tranform method for several commonly used random variates are given. 7 3.2 Inverse Transform for the Exponential Distribution Following are the specic steps required to obtain exponentially distributed random variates with mean from a random number stream using the inverse transform method. 3.3 F (x) = 1 e x . Set F (X ) = 1 X= e x = R. R). ln(1 1 For i = 1; 2; :::; n, compute Xi = ln(1 1 Ri) Inverse Transform for the Uniform Distribution Following are the specic steps required to obtain uniformly distributed random variates between a and b from a random number stream using the inverse transform method. 3.4 F (x) = x a. b a Set F (X ) = Xb aa = R. X = a + (b a)R. For i = 1; 2; :::; n, compute Xi = a + (b a)Ri Inverse Transform for the Weibull Distribution Following are the specic steps required to obtain Weibull distributed random variates with parameters and from a random number stream using the inverse transform method. F (x) = 1 e Set F (X ) = 1 x=) . ( e X=) = ( R. 1 X = [ln(1 R)] . 1 For i = 1; 2; :::; n, compute Xi = [ln(1 Ri )] 8 3.5 Inverse Transform for the Triangular Distribution Following are the specic steps required to obtain random variates with triangular distribution between 0 and 2 with mode 1 from a random number stream using the inverse transform method. 8 > 0 > < x2 F (x) = > 12 > : 3.6 Xi = x)2 2 1 (p 2 (2 2Rqi 2(1 x0 0<x1 1<x2 x>2 Ri ) 0 < Ri 1 2 1 2 < Ri 1 Inverse Transform for Empirical Distributions If no appropriate distribution can be found for the data one can resort to resampling the data. This creates an empirical distribution. A simple empirical distribution can be produced from given data by piecewise linear approximation. Assume the available data points (observations) are arranged in increasing order x1 ; x2 ; :::; xn . Assume also that a probability is assigned to each resulting range xj xj 1 such that the cummulative probability of the rst j intervals is cj . The associated random variate is obtained as Xi = xj 1 + when cj 3.7 1 xj cj xj 1 (R cj 1 i cj 1) < Ri cj . Inverse and Direct Transforms for the Normal Distribution The normal distribution does not have a closed-form inverse transformation. However, the following expression is an excellent approximation. Xi Ri0:135 (1 Ri )0:135 0:1975 A direct transformation can be used to produce two independent standard normal variates Z1 and Z2 from two random numbers R1 and R2 according to 1 Z1 = ( 2 ln R1 ) 2 cos(2R2) and 1 Z2 = ( 2 ln R1 ) 2 sin(2R2) 9 Normal random variates Xi with mean and standard deviation can then be obtained from Xi = + Zi 3.8 Inverse Transform for the Discrete Distributions A similar procedure to the one indicated above can be used to produce discretely distributed random variates. Since the cummulative distribution functions for discrete distributions consist of discrete jumps separated by horizontal plateaus, lookup tables are a convenient and very eÆcient method of generating inverses. 3.9 Other Methods of Generating Random Variates When two or more random variables are added together to produce a new random variable with a desired distribution one is using the method of convolution. If one generates the random variate by selective accepting or rejecting numbers from a random number stream one is using the acceptance-rejection technique. Detailed descriptions of these two methods as well as examples can be found in your textbook. 10