Download s05.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Transcript
Session 5
Generation of Random Numbers and
Random Variates
Ernesto Gutierrez-Miravete
Spring 2002
1
Generation of Random Numbers and Pseudo-Random
Numbers
Recall that for a random variable X which is uniformly distributed in [0; 1] the uniform
probability density function is
f (x) =
(
1; 0 x 1
0; otherwise
while its cummulative distribution function is
8
>
< 0; x < 0
F (x) = > x; 0 x < 1
: 1; x 1
A random number (RN) stream is a collection of uniformly distributed random variables.
A truly random stream of numbers has the following characteristics:
Uniformly distributed.
Continuous-valued.
E (R) = 21 .
2 = 121 .
No autocorrelation between numbers.
No runs.
1
In practice one always works with streams of pseudo random numbers (PRN). These
have approximately the same characteristics as RN's. PRN's are generated with a computer
using a numerical algorithm embedded in a computer program or routine. The requirements
of a good PRNG routine are:
1.1
Fast.
Portable.
Long Cycle.
Replicability.
Produce PRN with the desired characteristics.
The Linear Congruential Method
The established algorithm for PRN generation is the linear congruential method (LCM).
More sophisticated approaches still use as foundation this method. The fundamental relationship of the LCM is
Xi+1 = (aXi + c)mod (m)
This means that the value of Xi+1 is the remainder left from integer division of aXi + c by
m. Note that the values obtained form the LCM are from the set I = f0; 1=m; 2=m; :::; (m
1)=mg.
One key feature of the method is its period (P ) (the number of numbers that can be
generated before the same number appears twice). The period is related to the values of m
and c as follows:
1.2
If m = 2b and jcj > 0, P = m = 2b .
If m = 2b and c = 0, P = m=4 = 2b 2 .
If m = prime and c = 0, P = m
1 = 2b
1.
The Combined Linear Congruential Method
Large simulations require large collections of PRNs and there is a need for still longer periods.
These can be obtained by the use of combined linear congruental methods (CLCM).
The fundamental theorem associated with CLCM is L'Ecouyer's.
If W i; 1; Wi;2 ; :::; Wi;k are independent discrete-valued random variables with al least one
of them (say Wi;1 ) being uniformly distributed between 0 and m1 2. then
2
Wi = (
k
X
j =1
Wi;j )mod (m1
1)
is a uniformly distributed RV between 0 and m1 2.
More specically, consider the following algorithm
k
X
Xi = ( ( 1)j 1Xi;j )mod (m1
1)
j =1
where the Xi;j are LC and with
Ri =
(
Xi
m1 ;
m1 1 ;
m1
Xi > 0
Xi = 0
It can be shown that the maximum period obtained with this algorithm is
P=
(m1
1)(m2 1):::(mk
2k 1
1)
Example. L'Ecuyer proposed the following CLCM:
X1;j +1 = 40014X1;j mod (2147483563)
X2;j +1 = 40692X2;j mod (2147483399)
produce the combined PRNG
Xj +1 = (X1;j +1
to yield
Rj +1 =
2
(
X1;j +1)mod (2147483562)
Xj +1
2147483563
2147483562
2147483563
; Xj +1 > 0
; Xj +1 = 0
Tests for Random Numbers
. Since one always works in practice with PRN streams it is necessary to check how close
are their characteristics to those of real RN streams. Assume a stream containing N PRN's
has been produced. To verify their characteristics the stream is subjected to various tests.
In all cases, one states a hypothesis about a given characteristic of the stream and then
accepts it or rejects it with a given level of signicance where
= P (rejectingH0 jH0 is true)
3
(i.e. Type I error).
In testing for uniformity The null hypothesis H0 is
Ri 2 U [0; 1]
while the alternative hypothesis H1 is
Ri 2= U [0; 1]
In testing for independence The null hypothesis H0 is
Ri 2 independent
while the alternative hypothesis H1 is
Ri 2= independent
2.1
Kolmogorov-Smirnov Frequency test
For this test the numbers are rst arranged in increasing order
R1 < R2 < ::: < RN
The test makes use of the new variables
D+ = max (
i
N
D = max (Ri
Ri )
i 1
)
N
and
D = max (D+ ; D )
Once D has been computed, a critical value Dc is obtained from the K-S statistical
table for the desired and the given N . Finally
If D > Dc , H0 is rejected (H1 is accepted).
If D Dc , H0 is not rejected (i.e. the numbers are uniformly distributed).
4
2.2
Chi-square Frequency test
In this test the numbers are arrenged into n classes by subdividing the range [0; 1] into n
subintervals and determining how many of the numbers eand up in each class i, (Oi ).
The test uses the statistic
20 =
n (O
2
X
i Ei )
Ei
i=1
where Ei = N=n are the expected numbers of numbers in each class for a uniform distribution.
Once 20 has been computed, a critical value 2;n 1 is obtained from the Chi-square
statistical table. Finally
2.3
If 20 > 2;n 1 , H0 is rejected (H1 is accepted).
If 20 2;n 1 , H0 is not rejected (i.e. the numbers are uniformly distributed).
Runs Test
This test aims to detect whether there are patterns in substrings of the stream. One examines
the stream and checks whether each number is followed by a larger (+) or a smaller ( )
number. Runs are the resulting patterns of +'s and 's. In a truly random sequence the
mean and variance of the number of up and down runs a are given by
a =
2N
3
1
and
a2 =
16N 29
90
When N > 20 the distribution of a is close to normal so the test statistic is
Z0 =
a a
a
which has the normal distribution of mean zero and unit standard deviation (N (0; 1)).
Once Z0 has been computed a critical value z=2 is obtained from the normal statistical
table. Finally
If Z0 < z=2 or Z0 > z=2, H0 is rejected (H1 is accepted).
If
z=2 Z0 z=2, H0 is not rejected (i.e. the numbers are independent).
5
Other types of runs tests are also possible, for instance runs above and below the mean
and run lengths. For runs above and below the mean a test similar to the one above is used
but with the values of mean and variance for the number of runs b
b =
2n1 n2
N
+
1
2
and
2n1 n2 (2n1 n2 N )
N 2 (N 1)
b2 =
where n1 and n2 are, respectively, the numbers of runs above and below the mean.
For run lengths one uses the Chi square test to compare the observed number of runs
of given lengths against the expected number obtained in a truly independent stream.
2.4
Autocorrelation Test
This test aims to detect correlation among numbers in the stream separated by specic
number of numbers (lag). Consider the aurocorrelation test for a lag m. One investigates
then the behavior of numbers Ri and Ri+jm . If the autocorrelation im > 0 there is positive
correlation (i.e. high numbers follow hign numbers and viceversa) and if im < 0 one has
negative correlation. The autocorrelation is estimated by
0im =
M
1 X
[ Ri+km Ri+(k+1)m]
M + 1 k=0
0:25
where M is the largest integer satisfying i + (M + 1)m N . The test statistic is in this case
given by
Z0 =
where
im =
0
0im
im
0
p
13M + 7
12(M + 1)
Once Z0 has been computed a critical value z=2 is obtained from the normal statistical
table. Finally
If Z0 < z=2 or Z0 > z=2, H0 is rejected (H1 is accepted).
If
z=2 Z0 z=2, H0 is not rejected (i.e. the numbers are independent).
6
2.5
Gap Test
This test checks for independence by tracking down the pattern of gaps between a given
digit in the stream. The test is performed using the Kolmogorov-Smirnov scheme.
2.6
Poker Test
This test checks for independence based on the repetition of certain digits in the sequence.
The test is performed using the Chi-square scheme.
3
Generation of Random Variates
Discrete event simulation models require as inputs the values of random variables with
specied probability distributions. Such random variables are called random variates.
Input data for DES models are collected from the eld and/or produced from best available
estimates. However, the amount of data collected is rarely enough to run simulation models
and one must use the data to create PRN streams with statistical characteristics similar to
those of the original data.
So, on the one hand one needs to identify the statistical characteristics of the original
data and on the other one must be able to produce large collections of random variates
with statistical characteristics similar to those of the original data. Here we focus on the
second aspect, namely once we have determined the probability distribution applicable to
our data we proceed to generate random variate streams for use in the simulation. This is
accomplished by the inverse transform method.
3.1
The Inverse Transform Method
Given a random (or pseudo-random) number R and a random variate X ,
Determine the cummulative distribution function of X , F (X ).
Set F (X ) = R.
Solve the equation F (X ) = R for X in terms of R, i.e. X = F
1
(R).
Repeat the above for the stream of random (or pseudo-random) numbers R1 ; R2 ; :::; Rn
to obtain the stream of random variates X1 ; X2 ; :::; Xn .
Next, the formulae obtained by the inverse tranform method for several commonly used
random variates are given.
7
3.2
Inverse Transform for the Exponential Distribution
Following are the specic steps required to obtain exponentially distributed random variates
with mean from a random number stream using the inverse transform method.
3.3
F (x) = 1 e
x .
Set F (X ) = 1
X=
e
x =
R.
R).
ln(1
1
For i = 1; 2; :::; n, compute Xi =
ln(1
1
Ri)
Inverse Transform for the Uniform Distribution
Following are the specic steps required to obtain uniformly distributed random variates
between a and b from a random number stream using the inverse transform method.
3.4
F (x) =
x a.
b a
Set F (X ) = Xb aa = R.
X = a + (b a)R.
For i = 1; 2; :::; n, compute Xi = a + (b
a)Ri
Inverse Transform for the Weibull Distribution
Following are the specic steps required to obtain Weibull distributed random variates with
parameters and from a random number stream using the inverse transform method.
F (x) = 1 e
Set F (X ) = 1
x=) .
(
e
X=) =
(
R.
1
X = [ln(1 R)] .
1
For i = 1; 2; :::; n, compute Xi = [ln(1
Ri )] 8
3.5
Inverse Transform for the Triangular Distribution
Following are the specic steps required to obtain random variates with triangular distribution between 0 and 2 with mode 1 from a random number stream using the inverse transform
method.
8
>
0
>
< x2
F (x) = > 12
>
:
3.6
Xi =
x)2
2
1
(p
2
(2
2Rqi
2(1
x0
0<x1
1<x2
x>2
Ri )
0 < Ri 1
2
1
2
< Ri 1
Inverse Transform for Empirical Distributions
If no appropriate distribution can be found for the data one can resort to resampling the
data. This creates an empirical distribution. A simple empirical distribution can be
produced from given data by piecewise linear approximation.
Assume the available data points (observations) are arranged in increasing order x1 ; x2 ; :::; xn .
Assume also that a probability is assigned to each resulting range xj xj 1 such that the
cummulative probability of the rst j intervals is cj . The associated random variate is
obtained as
Xi = xj 1 +
when cj
3.7
1
xj
cj
xj 1
(R
cj 1 i
cj 1)
< Ri cj .
Inverse and Direct Transforms for the Normal Distribution
The normal distribution does not have a closed-form inverse transformation. However, the
following expression is an excellent approximation.
Xi Ri0:135
(1 Ri )0:135
0:1975
A direct transformation can be used to produce two independent standard normal variates
Z1 and Z2 from two random numbers R1 and R2 according to
1
Z1 = ( 2 ln R1 ) 2 cos(2R2)
and
1
Z2 = ( 2 ln R1 ) 2 sin(2R2)
9
Normal random variates Xi with mean and standard deviation can then be obtained
from
Xi = + Zi
3.8
Inverse Transform for the Discrete Distributions
A similar procedure to the one indicated above can be used to produce discretely distributed
random variates. Since the cummulative distribution functions for discrete distributions
consist of discrete jumps separated by horizontal plateaus, lookup tables are a convenient
and very eÆcient method of generating inverses.
3.9
Other Methods of Generating Random Variates
When two or more random variables are added together to produce a new random variable
with a desired distribution one is using the method of convolution.
If one generates the random variate by selective accepting or rejecting numbers from a
random number stream one is using the acceptance-rejection technique.
Detailed descriptions of these two methods as well as examples can be found in your
textbook.
10