Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
28.10.1998 Random Numbers Probability Theory and Statistics • ref: Law-Kelton -91, Chapter 4 • Random number (satunnaisluku) • Sample space (otosavaruus) Random Numbers Distributions Samples Confidence Intervals • discrete • continuous • Random variable S = {1, 2, 3, 4, 5, 6} S = [0, 1] S =[0,1) (satunnaismuuttuja) – one way to generate random numbers – determined by a function or rule: R -> S – random variables: X, Y=5X, Z=g(X,Y) 28.10.1998 Copyright Teemu Kerola 1998 1 Random Variable X • Distribution function F(x) = P(X ≤ x) • Density function f(x) = P(X=x) = P(x) • Both F(x) and f(x) determine the (same) distribution dice: P(X=6) = 1/6 • X discrete? – P(a ≤ X ≤ b) = ∑ab f(x) = F(b) - F(a) – P(a < X ≤ b) = ∫ab f(x) = F(b) - F(a) 28.10.1998 Copyright Teemu Kerola 1998 (tiheysfunktio) Prob • X continuous? 28.10.1998 f(x) Density: Discrete distribution X = Random[1,6] 1/6 0 0 1 2 3 4 5 6 -∞ ∞ sum = 1 F(x) 1 (kertymäfunktio) Distribution: temperature: P(20 ≤ X ≤ 24) = 80% Copyright Teemu Kerola 1998 3 f(x) 0 0 1 2 3 4 5 6 -∞ 28.10.1998 ∞ Copyright Teemu Kerola 1998 Continuous distribution X = Random[1,6] Density: 2 4 Continuous distribution X = Random[0,1) f(x) Density: 1 1/5 -∞ 0 0 1 2 F(x) 1 5 6 ∞ Distribution: -∞ 28.10.1998 Prob. Theory and Statistics -∞ area = 1 area = 1 0 0 F(x) 1 ∞ 1 Distribution: 0 0 1 2 5 6 Copyright Teemu Kerola 1998 ∞ -∞ 5 28.10.1998 0 0 1 Copyright Teemu Kerola 1998 ∞ 6 1 28.10.1998 Continuous distribution X = Expon(a) f(x) Properties for Random Var. X 1/a memoryless property area = 1 ∞ 1 −∞ x x+∆x 5% percentile = X0.05 0 28.10.1998 0 7 Variance and Deviation (varianssi) Norm(0,1) f(x) µ ∞ • average E(X) = EX = µ = ∫x x f(x) dx – describes where the values are located • variance Var(X) = σ2 = E[ (X-µ)2 ] = E(X2)-µ2 – describes how close to average the values are • Standard deviation σ = σ2 Copyright Teemu Kerola 1998 Covariance 9 (kovarianssi) 60% of probability mass – Does X depend on Y or vice versa or what? • Covariance Cov(X,Y) = CXY = E[ (X-µX)(Y-µY) ] = E(XY) - µX µY • CXY = CYX • CXX = Cov(X,X) = Var(X) = σ2 X large => Y ??? – CXY = 0 X, Y uncorrelated – CXY > 0 X, Y positively correlated X large => Y large – CXY < 0 X, Y negatively correlated X large => Y small Copyright Teemu Kerola 1998 −∞ −1.65 −0.84 0 0.84 ∞ 1.65 • Normal distribution Norm (µ, σ2) • Standard Normal Distribution = Z-distribution = Norm (0, 1) = N(0, 1) • Z-percentiles Z0.95 = 1.645 = Z1 - α/2 for α = 10% 28.10.1998 Copyright Teemu Kerola 1998 Correlation 10 (korrelaatio) • Problem: CXY varies much in magnitude – can not deduct much from its magnitude • X and Y are random variables Prob. Theory and Statistics 8 90% of probability mass µ µ Copyright Teemu Kerola 1998 Standard Normal Distribution very small variance small var. 28.10.1998 28.10.1998 (hajonta) large var. 28.10.1998 95% percentile = X0.95 EX = ∫x x f(x) dx – EX = ∑x x P(x) ∞ 1 Copyright Teemu Kerola 1998 −∞ ∞ x’ x’+∆x (mediaani) • median X0.5: F(X0.5)= 0.5 • mean, expected value EX = E(X) = µ (keskiarvo) (unohtavaisuus ominaisuus) f(x) P(X [x’, x’+∆x]) 5% 0 F(x) 1 -∞ (fraktiili) P(X [x, x+∆x]) 5% 0 -∞ 50% percentile = X0.5 f(x) 11 • Solution: scale with (geom. aver) variance • Correlation CXY Cor(X,Y) = ρXY = ρXY = 0 ρXY > 0 ρXY < 0 ρXY = 0.98 σ2Xσ2Y ∈[−1,1] X, Y uncorrelated X, Y positively correlated X, Y negatively correlated X, Y highly positively correlated • Correlation does not mean causal dependence – Fig. on homeworks vs. exams results 28.10.1998 Copyright Teemu Kerola 1998 12 2 28.10.1998 Independent X and Y ? Set of Random Variables (riippumattomuus) (yhteisjakauman tiheysfuntio) • Random variables X, Y • Joint prob. density function f (X,Y): P(X∈A, Y∈B) = ∫A ∫B f(x,y) dx dy • X and Y are independent if f(x,y) = fX(x) . fY(y) where fX(x) = ∫-∞,∞ f(x,y) dy marginal prob. fY(y) = ∫-∞,∞ f(x,y) dx density functions 28.10.1998 Copyright Teemu Kerola 1998 13 • Many random variables Xi or Xi – i ∈ [1, N] • • • • • 28.10.1998 Copyright Teemu Kerola 1998 • Random variables X1 , X2 , ... • Xi‘s identically distributed? EXi = µi = µ ∀i = 1, 2, .... Var(Xi)= σi2 = σ2 ∀i = 1, 2, .... fXi(x) = f(x) ∀i = 1, 2, .... • Xi‘s independent? Xi and Xj are independent ∀ij ∈ {1, 2, ...} • Now, Xi‘s are IID • Problem: What is the distribution of X? • Solution: try it out and see • Sample: one set of values for X – throw a dice for 100 times? 10000 times? – measure response time from 100 trials? – write down sales sum for next 10 years? • New question: what can we tell about X from the sample? – independent and identically distributed (IID) – almost never the case in real life – almost always needed for the math to work out Copyright Teemu Kerola 1998 – What are EX=µ and Var(X)= σ2 ? 15 28.10.1998 Copyright Teemu Kerola 1998 Sample Properties • Sample Set S = { Xi | i=1,...,n } • Sample points Xi X ( n) = ∑X i (otosjoukko) – problem: var( X )= σ2 is unknown – solution: use sample variance s2(n) instead: Prob. Theory and Statistics Copyright Teemu Kerola 1998 ∑ [ X − X(n) ] ∑ X 2 s2 (n) = i i n −1 = i 2 i σ2 n − n X2 (n) (otosvarianssi) n −1 (otoskeskiarvo) – now, (harhaton estimaatti) is unbiased estimator for EX=µ: EX (n) = µ – with large n, sample mean is close to µ – how close? how large n? 28.10.1998 Var ( X(n) ) = • variance for sample mean: i n 16 Sample Properties (otospisteet) – should be set of IID random variables – Xi ~ X, i.e., Xi has the (unknown) same distribution than X EXi=EX=µ and Var(Xi)=Var(X)= σ2 • Sample mean 14 Sample IID Set of Random Variables 28.10.1998 i ∈ [1, ∞] E(Xi) = EXi = µXi = µi Var(Xi) = Var(Xi) = σXi2 = σi2 Cov(Xi, Xj) = Cov(Xi,Xj) = CXi,Xj = Ci,j = Cij Cor(Xi, Xj) = Cor(Xi,Xj) = ρXi,Xj = ρi,j = ρij All comparisons pair-wise! 17 s2 (n) ) Var( X(n) ) = = n ∑ [X i − X(n) ]2 i n(n − 1) is unbiased estimator for Var (X(n) ) – know now how good estimator sample mean is 28.10.1998 Copyright Teemu Kerola 1998 18 3 28.10.1998 How to Use Central Limit Theorem? Central Limit Theorem (CLT) • “When n becomes large, normalized sample mean becomes normally distributed” X(n) − µ σ2 / n • Problem: what if n is not very large? • Solution: use Student distribution instead: X(n) − µ σ2 / n P ( − z1−α / 2 ≤ P( − z1−α / 2 ≤ ≈ T(n − 1) (vapausaste) Copyright Teemu Kerola 1998 19 { 28.10.1998 f(x) } { { 28.10.1998 } 90%of prob. mass −∞ } 21 f(x) small sample −1.65 −0.84 0 0.84 large sample • Distribution for normalized µ based on sample • 60% CI is narrower, but more likely to be wrong, i.e., 40% change that it does not to contain the true mean 28.10.1998 Copyright Teemu Kerola 1998 • Question: X is normally distributed. What is the mean µ of X? 1.20, 1.50, 1.68, 1.89, 0.95, 1.49, 1.58, 1.55, 0.50, 1.09 • Answer: – Take sample of 10 observations for X – compute X(10) and s2 (10) 1.34 and 0.17 µ µ • distribution s2 for µ (the true, unknown mean) based on sample • large sample, s2 small • small sample, s2 large • Strong Law of Large Numbers: (suurten lukujen laki) – get 90% percentile for T-distribution with 9 degrees of freedom t9, 0.95 = 1.83 – compute 90% confidence interval for µ: X(10) ± t 9, 0.95 s2 (10) / 10 →∞ X(n) n → µ with probability 1 23 22 Example A N(µ , s2) Copyright Teemu Kerola 1998 1.65 ∞ 90% confidence interval 60% confidence interval s (n) / n ) = 90% n ≤ 30 ? 2 Effect of Sample Size, Strong Law of Large Numbers Prob. Theory and Statistics 20 N(0, 1) n > 30 ? Copyright Teemu Kerola 1998 28.10.1998 (luottamusväli) Copyright Teemu Kerola 1998 s (n) / n ) = 1 − α 2 P ( µ ∈ X(n) ± z0.95 s2 (n) / n ) = 90% P ( µ ∈ X(n) ± t n −1, 0.95 } s2 (n) / n ) = 1 − α Confidence Interval (CI) • 90% confidence interval, α = 10% or ≤ z1− α / 2 ) = 1 − α X(n) − µ ≤ z1−α / 2 ) = 1 − α s2 (n) / n P ( µ ∈ X(n) ± z1− α / 2 • (1-α) confidence interval { σ2 / n • Solve for µ, get confidence interval for µ Confidence Interval P ( µ ∈ X(n) ± z1−α / 2 X(n) − µ • Use unbiased estimator for σ2 where n-1 is the degrees of freedom for T • Percentiles for N(0,1) and T(k) are tabulated 28.10.1998 • Use CLT for percentiles (keskeinen raja-arvolause) →∞ n → N(0,1) 28.10.1998 Copyright Teemu Kerola 1998 (1.34 - 0.24, 1.24 + 0.24) = (1.10, 1.44) 24 4 28.10.1998 Example B Example C • Question: I know the distribution. Is the mean µ = 5? (“Two-tailed hypothesis” H0) • Answer: • Question: do X and Y have same mean? • Answer: – Take samples for X and Y, sample averages: X(n) and Y(m) – get test factor tn+m-1 : – Take sample of 10, compute test factor t9 : t9 = X(10) − 5 (testisuure) t n + m −1 = s2 (10) / 10 – get 90% confidence interval for T-distribution with 9 degrees of freedom (-t9, 0.95 , t9, 0.95 ) = (-1.83, 1.83) – if test factor t9 is not in there, reject H0, i.e., say “no” • 10% chance of being wrong! (why?) – if tn is in there, accept H0, i.e., say “yes” 28.10.1998 Copyright Teemu Kerola 1998 25 X (n) − Y(m ) 1 1 s + n m where s 2 = 1 n+m −2 (∑ ( X i − X )2 + ∑ (Y i − Y )2 ) – get 90% confidence interval for T-distribution – if tn+m-1 is not in there, reject H0, i.e., say “no” • 10% chance of being wrong 28.10.1998 Copyright Teemu Kerola 1998 26 28.10.1998 Copyright Teemu Kerola 1998 28 Usual Assumptions That Usually Do Not Apply • Normal distribution – central limit theorem – strong law of large numbers • Independent samples – X1, X2, ... • Assumptions required by probability theory and statistics to work do not usually apply • Results may still be usable • Watch out and always check how far off you are from required assumptions 28.10.1998 Prob. Theory and Statistics Copyright Teemu Kerola 1998 27 5