Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EC381/MN308 Probability and Some Statistics Lecture 18 - Outline 1. Sums of Random Variables Yannis Paschalidis a. Motivation for limit theorems yannisp@bu.edu, http://ionia.bu.edu/ b. Transforms c. Central limit theorem Dept. of Manufacturing Engineering Dept. of Electrical and Computer Engineering Center for Information and Systems Engineering 1 EC381/MN308 – 2007/2008 What are limit theorems? Limit Theorems • Limit theorems specify the probabilistic behavior of sums of random variables as n → ∞ • Objective: Given many joint random variables in an experiment: • Limits as in calculus, but now deal with a limit of a random sequence – Can we determine limits of functions of these random variables? – Example: average of many random variables – What does convergence in this context mean? • Will require restrictions on RVs so limits exist, such as: • Example: Collecting many independent samples for an experiment – Theoretical foundations of empirical statistics. 3 EC381/MN308 – 2007/2008 2 EC381/MN308 – 2007/2008 – – – – – Independent random variables Uncorrelated random variables Identical marginal CDFs/PDFs/PMFs Identical means and/or variances Other … Mean/Variance of Sum of RVs Motivating Example • Consider Wn = X1 + X2 + … + Xn • Bernoulli random variable X with parameter p E[Wn] = n X i=1 • Suppose we repeat the experiment independently, generating samples Xi, i = 1, 2, …, n – This is the sample average • What happens to Zn as n Æ ∞ ? – Should be close to p, one hopes… make this precise! = 5 E[Xi] = EC381/MN308 – 2007/2008 ⎡⎛ n X μi i=1 ⎞2⎤ n ⎢⎝ X ⎥ V ar(Wn) = E⎣ (Xi − μi)⎠ ⎦ i=1 i=1 ⎡⎛ ⎞⎛ ⎞⎤ n n X X ⎣ ⎝ ⎠ ⎝ =E (Xi − μi) (Xi − μi)⎠⎦ i=1 i=1 n X n X = Cov(Xi, Xj ) i=1 j=1 n−1 n n X X X n X • Define the derived random variable Zn = (X1 + X2 + … + Xn)/n EC381/MN308 – 2007/2008 4 EC381/MN308 – 2007/2008 V ar(Xi) + 2 i=1 Cov(Xi, Xj ) i=1 j=i+1 6 1 Average of n Random Variables Average of n RVs - 2 • Assume joint RVs X1, …, Xn with finite expected values μ1, …, μn • Motivation: an experiment that generates RV X is repeated independently to generate samples Xi – Assume Xi has finite mean μ – Since Xi are generated by independent samples of same experiment, they are i.i.d. (independent and identically distributed) • Define derived RV Zn = (X1 + … + Xn)/n • E [Zn] = (E [X1] + … + E [Xn])/n – Linearity of E [•] • Definition: Sample mean Zn is average of Xi • E [Zn] = (μ1+ … + μn)/n Zn = (X1 + … + Xn)/n • Note: if all Xi had same mean μ, then E [Zn] = μ for all n • What happens to Zn as n Æ ∞ ? 7 EC381/MN308 – 2007/2008 Average of n RVs - 3 Average of n RVs - 4 • As n Æ ∞ , the variance of Zn Æ 0 • Zn is a derived random variable Has mean E [Zn] = E [Xi] = μ Less and less uncertainty in Zn This implies that Zn is very close to its average μ • Since Zn is sum of independent random variables (uncorrelated would have been enough) • Note that the above results would hold even if the variables Xi were uncorrelated instead of independent Var [Zn] = Var [X1/n] + … + Var [Xn/n] • Note: All that is needed is Cov[Xi, Xj] = 0 if i differs from j Var [X1] = Var [Xi] for all i Var [Xi /n] = Var [Xi]/(n2) by scaling law Î Var [Zn] = n Var [X ]/(n2) i • Zn is an estimate of μ, the expected value of X Given many samples, estimate has a very small error = Var [X]/n 9 EC381/MN308 – 2007/2008 8 EC381/MN308 – 2007/2008 10 EC381/MN308 – 2007/2008 A Useful Statistic: Moment Generating Function Example: Quiz 7.1 • For continuous RVs, density fX(x) integrates to 1 • Let X be exponential with λ=1 (E [X] = 1/λ=1). Let Zn denote the sample mean of n independent samples of X. – It has a Fourier Transform! – It has a two-sided Laplace transform with a region of convergence • Definition: The moment generating of a RV X is defined as How many samples are needed so that the variance of the sample mean is less than 0.01? We have Var [X] = 1/λ2 =1. ΨX (s) = E [esX] Thus, Var [Zn] = Var [X] /n = 1/n Called φX (s) in Yates/Goodman text Two-sided Laplace transform of fX(x) Can look them up in tables Î Requires 101 samples since 100 samples gives exactly 0.01 EC381/MN308 – 2007/2008 11 EC381/MN308 – 2007/2008 12 2 Moment Generating Functions: Why would one care? A Most Useful Property • Can get moments by differentiating the MGF (which is why it has that name) • X, Y independent jointly continuous RVs • Z=X+Y – The Laplace (or Fourier) transform of a convolution is the product of the Laplace (or Fourier) transforms, as in signals and systems 13 EC381/MN308 – 2007/2008 A Particularly Useful Moment Generating Function: the Gaussian THE MGF OF A GAUSSIAN PDF 15 IS ITSELF GAUSSIAN! Random Sum of i.i.d. RVs Moment generating function for families of random variables. EC381/MN308 – 2007/2008 16 The Central Limit Theorem • R=X1+L+XN, Xi are i.i.d., N is random E[R] =E[E[R|N = n]] = E[nE[X]] = E[N ]E[X] E[R2] =E[E[R2|N = n]] =E[E[R2 − (E[R|N = n])2|N = n]] + E[(E[R|N = n])2|N = n]] =E[V ar(R|N = n)] + E[n2 (E[X])2] =E[nV E[ V ar(X)] (X)] + E[N 2](E[X])2 • SLLN says Zn = (X1 + … + Xn)/n converges to μ with probability one – Implies the CDF of Zn converges to a unit step at μ – Can we say more? • Assume the Xi have finite variance σX2. Then, Zn − μ is a zero-mean random variable with variance σX2/n =E[N ]V ar(X) + E[N 2 ](E[X])2 V ar(R) =E[R2 ] − (E[N ]E[X])2 = E[N ]V ar(X) + (E[X])2 V ar(N ) E[esR] =E[E[es(X1+···+XN )|N = n]] =E[(ΨX (s))n] – Normalize: • Variance 1, mean zero for all n. • What is the CDF of Yn? =E[en ln ΨX (s)] = ΨN (ln ΨX (s)) EC381/MN308 – 2007/2008 14 Table 6.1 (p. 249) • X is a standard Gaussian, zero mean, variance 1: EC381/MN308 – 2007/2008 EC381/MN308 – 2007/2008 17 EC381/MN308 – 2007/2008 18 3 CLT - 2 CLT – 3 • Central Limit Theorem: Given iid RVs with finite mean μ, finite variance σX2, the CDF of • The CLT allows us to compute probabilities of Zn for finite, but large n – Probabilities referring to the differences between Zn and μ can be approximated by the Gaussian CDF – Note: this does not say the difference between Zn and μ is a Gaussian – Nevertheless, this explains the popularity and importance of Gaussian models converges g to Φ(•), ( ) the CDF of a unit Gaussian RV Î For each real number y, 19 EC381/MN308 – 2007/2008 Using the CLT CLT - 4 • Quiz 6.6: X milliseconds for each disk access time, uniformly distributed in [0,12]. Assume one must access disk 12 times independently. • Computation: rescaling, this can be interpreted as approximating the CDF of the sum (X1 + … + Xn) by that of a Gaussian, N(nμ, nσX2) – Good approximation for |a – nμ | < 3 n1/2 σX – Not so good for outliers (“tails” of the distribution”) - T = X1 + … + X12 E [ T ] = 12 E [X] = 12 * 6 = 72 msec Var [ T ] = 12 Var [X] = 12 * 122/12 = 144 msec2 - σ T = 12 msec BUT what if we want to know P(T > 75 msec) = ??? Turn to the CLT: T is sum of iid RVs, finite variances Î CDF of T is approximately N(72,144) P(T > 75) → Q(3/12) = Q(0.25) P(T < 48) → Q(24/12) = Q(2) To do this exactly would require 12 convolutions! 21 EC381/MN308 – 2007/2008 • Modem transmits 22 EC381/MN308 – 2007/2008 Using CLT again 104 20 EC381/MN308 – 2007/2008 CLT Demonstration bits – Each bit 0 or 1, i.i.d., p = 0.5 – Estimate P(number of 1’s > 5100) – Estimate P(number of 1’s ( ג4800,5100)) T = number of 1’s E [T] = 5000; Var [T] = 104*Var [Bernoulli(0.5)] = 2500, so standard deviation is 50 P(T > 5100) → Q(100/50) = Q(2) P(T < 4800) → Q(4) P(T ( ג4800,5100)) → 1 – Q(2) – Q(4) EC381/MN308 – 2007/2008 23 EC381/MN308 – 2007/2008 24 4 Proving the CLT Proving the CLT - 2 • Define the derived random variable: • MGF of W: • New random variables Wi are defined as • MGF of Yn The Wi are iid, zero-mean, variance 1/n 25 EC381/MN308 – 2007/2008 26 EC381/MN308 – 2007/2008 CLT Proof - 4 CLT Proof - 3 • Now, substitute back: • Let Xi be zero mean (wlog, simplifies algebra) • Now, note that ΨX(s) has a Taylor series p around zero,, as expansion • As n Æ ∞, all terms but the first go to zero for each fixed s! – The MGF of the scaled difference from the mean, Yn, converges to the Gaussian MGF, leads to the CLT EC381/MN308 – 2007/2008 27 EC381/MN308 – 2007/2008 28 5