Download Lecture 18 - Yannis Paschalidis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
EC381/MN308
Probability and Some Statistics
Lecture 18 - Outline
1. Sums of Random Variables
Yannis Paschalidis
a. Motivation for limit theorems
yannisp@bu.edu, http://ionia.bu.edu/
b. Transforms
c. Central limit theorem
Dept. of Manufacturing Engineering
Dept. of Electrical and Computer Engineering
Center for Information and Systems Engineering
1
EC381/MN308 – 2007/2008
What are limit theorems?
Limit Theorems
• Limit theorems specify the probabilistic behavior
of sums of random variables as n → ∞
• Objective: Given many joint random variables in
an experiment:
• Limits as in calculus, but now deal with a limit of
a random sequence
– Can we determine limits of functions of these random
variables?
– Example: average of many random variables
– What does convergence in this context mean?
• Will require restrictions on RVs so limits exist,
such as:
• Example: Collecting many independent samples
for an experiment
– Theoretical foundations of empirical statistics.
3
EC381/MN308 – 2007/2008
2
EC381/MN308 – 2007/2008
–
–
–
–
–
Independent random variables
Uncorrelated random variables
Identical marginal CDFs/PDFs/PMFs
Identical means and/or variances
Other …
Mean/Variance of Sum of RVs
Motivating Example
• Consider Wn = X1 + X2 + … + Xn
• Bernoulli random variable X with parameter p
E[Wn] =
n
X
i=1
• Suppose we repeat the experiment independently,
generating samples Xi, i = 1, 2, …, n
– This is the sample average
• What happens to Zn as n Æ ∞ ?
– Should be close to p, one hopes… make this precise!
=
5
E[Xi] =
EC381/MN308 – 2007/2008
⎡⎛
n
X
μi
i=1
⎞2⎤
n
⎢⎝ X
⎥
V ar(Wn) =
E⎣
(Xi − μi)⎠ ⎦
i=1
i=1
⎡⎛
⎞⎛
⎞⎤
n
n
X
X
⎣
⎝
⎠
⎝
=E
(Xi − μi)
(Xi − μi)⎠⎦
i=1
i=1
n X
n
X
=
Cov(Xi, Xj )
i=1 j=1
n−1
n
n
X
X X
n
X
• Define the derived random variable
Zn = (X1 + X2 + … + Xn)/n
EC381/MN308 – 2007/2008
4
EC381/MN308 – 2007/2008
V ar(Xi) + 2
i=1
Cov(Xi, Xj )
i=1 j=i+1
6
1
Average of n Random Variables
Average of n RVs - 2
• Assume joint RVs X1, …, Xn with finite expected
values μ1, …, μn
• Motivation: an experiment that generates RV X is
repeated independently to generate samples Xi
– Assume Xi has finite mean μ
– Since Xi are generated by independent samples of
same experiment, they are i.i.d. (independent and
identically distributed)
• Define derived RV Zn = (X1 + … + Xn)/n
• E [Zn] = (E [X1] + … + E [Xn])/n
– Linearity of E [•]
• Definition: Sample mean Zn is average of Xi
• E [Zn] = (μ1+ … + μn)/n
Zn = (X1 + … + Xn)/n
• Note: if all Xi had same mean μ, then E [Zn] = μ
for all n
• What happens to Zn as n Æ ∞ ?
7
EC381/MN308 – 2007/2008
Average of n RVs - 3
Average of n RVs - 4
• As n Æ ∞ , the variance of Zn Æ 0
• Zn is a derived random variable
Has mean E [Zn] = E [Xi] = μ
Less and less uncertainty in Zn
This implies that Zn is very close to its average μ
• Since Zn is sum of independent random
variables (uncorrelated would have been enough)
• Note that the above results would hold even if
the variables Xi were uncorrelated instead of
independent
Var [Zn] = Var [X1/n] + … + Var [Xn/n]
• Note:
All that is needed is Cov[Xi, Xj] = 0 if i differs from j
Var [X1] = Var [Xi] for all i
Var [Xi /n] = Var [Xi]/(n2) by scaling law
Î Var [Zn] = n
Var [X ]/(n2)
i
• Zn is an estimate of μ, the expected value of X
Given many samples, estimate has a very small error
= Var [X]/n
9
EC381/MN308 – 2007/2008
8
EC381/MN308 – 2007/2008
10
EC381/MN308 – 2007/2008
A Useful Statistic: Moment
Generating Function
Example: Quiz 7.1
• For continuous RVs, density fX(x) integrates to 1
• Let X be exponential with λ=1 (E [X] = 1/λ=1).
Let Zn denote the sample mean of n
independent samples of X.
– It has a Fourier Transform!
– It has a two-sided Laplace transform with a region of convergence
• Definition: The moment generating of a RV X is defined as
How many samples are needed so that the variance of
the sample mean is less than 0.01?
We have Var [X] = 1/λ2 =1.
ΨX (s) = E [esX]
Thus, Var [Zn] = Var [X] /n = 1/n
Called φX (s) in Yates/Goodman text
Two-sided Laplace transform of fX(x)
Can look them up in tables
Î Requires 101 samples since 100 samples gives
exactly 0.01
EC381/MN308 – 2007/2008
11
EC381/MN308 – 2007/2008
12
2
Moment Generating Functions:
Why would one care?
A Most Useful Property
• Can get moments by differentiating the MGF
(which is why it has that name)
• X, Y independent jointly continuous RVs
• Z=X+Y
– The Laplace (or Fourier) transform of a convolution is
the product of the Laplace (or Fourier) transforms, as
in signals and systems
13
EC381/MN308 – 2007/2008
A Particularly Useful Moment
Generating Function: the Gaussian
THE MGF OF A GAUSSIAN PDF
15
IS ITSELF GAUSSIAN!
Random Sum of i.i.d. RVs
Moment generating
function for families
of random
variables.
EC381/MN308 – 2007/2008
16
The Central Limit Theorem
• R=X1+L+XN, Xi are i.i.d., N is random
E[R] =E[E[R|N = n]] = E[nE[X]] = E[N ]E[X]
E[R2] =E[E[R2|N = n]]
=E[E[R2 − (E[R|N = n])2|N = n]] + E[(E[R|N = n])2|N = n]]
=E[V ar(R|N = n)] + E[n2 (E[X])2]
=E[nV
E[ V ar(X)]
(X)] + E[N 2](E[X])2
• SLLN says Zn = (X1 + … + Xn)/n converges to μ with
probability one
– Implies the CDF of Zn converges to a unit step at μ
– Can we say more?
• Assume the Xi have finite variance σX2.
Then, Zn − μ is a zero-mean random variable with
variance σX2/n
=E[N ]V ar(X) + E[N 2 ](E[X])2
V ar(R) =E[R2 ] − (E[N ]E[X])2 = E[N ]V ar(X) + (E[X])2 V ar(N )
E[esR] =E[E[es(X1+···+XN )|N = n]]
=E[(ΨX (s))n]
– Normalize:
• Variance 1, mean zero for all n.
• What is the CDF of Yn?
=E[en ln ΨX (s)] = ΨN (ln ΨX (s))
EC381/MN308 – 2007/2008
14
Table 6.1 (p.
249)
• X is a standard Gaussian, zero mean, variance 1:
EC381/MN308 – 2007/2008
EC381/MN308 – 2007/2008
17
EC381/MN308 – 2007/2008
18
3
CLT - 2
CLT – 3
• Central Limit Theorem: Given iid RVs with finite
mean μ, finite variance σX2, the CDF of
• The CLT allows us to compute probabilities of Zn
for finite, but large n
– Probabilities referring to the differences between Zn
and μ can be approximated by the Gaussian CDF
– Note: this does not say the difference between Zn and
μ is a Gaussian
– Nevertheless, this explains the popularity and
importance of Gaussian models
converges
g to Φ(•),
( ) the CDF of a unit Gaussian
RV
Î For each real number y,
19
EC381/MN308 – 2007/2008
Using the CLT
CLT - 4
• Quiz 6.6: X milliseconds for each disk access
time, uniformly distributed in [0,12]. Assume one
must access disk 12 times independently.
• Computation: rescaling, this can be interpreted
as approximating the CDF of the sum (X1 + … +
Xn) by that of a Gaussian, N(nμ, nσX2)
– Good approximation for |a – nμ | < 3 n1/2 σX
– Not so good for outliers (“tails” of the distribution”)
-
T = X1 + … + X12
E [ T ] = 12 E [X] = 12 * 6 = 72 msec
Var [ T ] = 12 Var [X] = 12 * 122/12 = 144 msec2
-
σ T = 12 msec
BUT what if we want to know P(T > 75 msec) = ???
Turn to the CLT: T is sum of iid RVs, finite variances Î
CDF of T is approximately N(72,144)
P(T > 75) → Q(3/12) = Q(0.25)
P(T < 48) → Q(24/12) = Q(2)
To do this exactly would require 12 convolutions!
21
EC381/MN308 – 2007/2008
• Modem transmits
22
EC381/MN308 – 2007/2008
Using CLT again
104
20
EC381/MN308 – 2007/2008
CLT Demonstration
bits
– Each bit 0 or 1, i.i.d., p = 0.5
– Estimate P(number of 1’s > 5100)
– Estimate P(number of 1’s ‫( ג‬4800,5100))
T = number of 1’s
E [T] = 5000; Var [T] = 104*Var [Bernoulli(0.5)] = 2500,
so standard deviation is 50
P(T > 5100) → Q(100/50) = Q(2)
P(T < 4800) → Q(4)
P(T ‫( ג‬4800,5100)) → 1 – Q(2) – Q(4)
EC381/MN308 – 2007/2008
23
EC381/MN308 – 2007/2008
24
4
Proving the CLT
Proving the CLT - 2
• Define the derived random variable:
• MGF of W:
• New random variables Wi are defined as
• MGF of Yn
The Wi are iid, zero-mean, variance 1/n
25
EC381/MN308 – 2007/2008
26
EC381/MN308 – 2007/2008
CLT Proof - 4
CLT Proof - 3
• Now, substitute back:
• Let Xi be zero mean (wlog, simplifies algebra)
• Now, note that ΨX(s) has a Taylor series
p
around zero,, as
expansion
• As n Æ ∞, all terms but the first go to zero for each fixed s!
– The MGF of the scaled difference from the mean, Yn, converges to
the Gaussian MGF, leads to the CLT
EC381/MN308 – 2007/2008
27
EC381/MN308 – 2007/2008
28
5
Related documents