Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to probability
Stat 134
FAll 2005
Berkeley
Lectures prepared by:
Elchanan Mossel
elena Shvets
Follows Jim Pitman’s
book:
Probability
Section 3.3
Histo 1
0.1
X = 2*Bin(300,1/2) – 300
E[X] = 0
0.
-50
-40
-30
-20
-10
0
10
20
30
40
50
Histo 2
0.1
Y = 2*Bin(30,1/2) – 30
E[Y] = 0
0.
-50
-40
-30
-20
-10
0
10
20
30
40
50
Histo 3
0.1
Z = 4*Bin(10,1/4) – 10
E[Z] = 0
0.
-50
-40
-30
-20
-10
-38
10
20
30
38
48
Histo 4
1.
0.9
W = 0
E[W] = 0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.
-50
-40
-30
-20
-10
0
10
20
30
40
50
A natural question:
•Is there a good parameter that allow
to distinguish between these
distributions?
•Is there a way to measure the spread?
Variance and Standard Deviation
•The variance of X, denoted by Var(X) is
the mean squared deviation of X from its
expected value m = E(X):
Var(X) = E[(X-m)2].
The standard deviation of X, denoted by
SD(X) is the square root of the variance of
X:
SD(X) = Var(X).
Computational Formula for
Variance
Claim:
Var(X) = E(X )  E(X) .
2
Proof:
E[ (X-m)2] = E[X2 – 2m X + m2]
E[ (X-m)2] = E[X2] – 2m E[X] + m2
E[ (X-m)2] = E[X2] – 2m2+ m2
E[ (X-m)2] = E[X2] – E[X]2
2
Properties of Variance and SD
1. Claim: Var(X) ¸ 0.
Pf: Var(X) =  (x-m)2 P(X=x) ¸ 0
2.Claim: Var(X) = 0 iff P[X=m] = 1.
Variance and SD
For a general distribution Chebyshev inequality
states that for every random variable X, X is
expected to be close to E(X) give or take a few
SD(X).
Chebyshev Inequality:
For every random variable X and all k > 0:
P(|X – E(X)| ¸ k SD(X)) · 1/k2.
Chebyshev’s Inequality
P(|X – E(X)| ¸ k SD(X)) · 1/k2
proof:
• Let m = E(X) and s = SD(X).
• Observe that |X–m| ¸ k s , |X–m|2 ¸ k2 s 2.
• The RV |X–m|2 is non-negative, so we can use
Markov’s inequality:
• P(|X–m|2 ¸ k2 s 2) · E [|X–m|2 ] / k2 s
P(|X–m|2 ¸ k2 s 2)
· s 2 / k2 s 2 = 1/k2.
2
Variance of Indicators
Suppose IA is an indicator of an event A with
probability p. Observe that IA2 = IA.
c
A
A
IA=0=IA
2
IA=1=IA2
E(IA2) = E(IA) = P(A) = p, so:
Var(IA) = E(IA2) – E(IA)2 = p – p2 = p(1-p).
Variance of a Sum of
Independent Random Variables
Claim: if X1, X2, …, Xn are independent then:
Var(X1+X2+…+Xn) = Var(X1)+Var(X2)+…+Var(Xn).
Pf: Suffices to prove for 2 random variables.
E[( X+Y – E(X+Y) )2 ] = E[( X-E(X) + Y–E(Y) )2] =
E[( X-E(X))2]+ 2 E[(X-E(X)) (Y-E(Y))] + E(Y–E(Y) )2]=
Var(X) + Var(Y) + 2 E[(X-E(X))] E[(Y-E(Y))]
Var(X) + Var(Y) + 0
(mult.rule) =
Variance and Mean under scaling
and shifts
• Claim: SD(aX + b) = |a| SD(X)
• Proof:
Var[aX+b] = E[(aX+b – am –b)2] =
= E[a2(X-m)2] = a2 s2
•
•
•
•
Corollary: If a random variable X has
E(X) = m and SD(X) = s > 0, then
X*=(X-m)/s has
E(X*) =0 and SD(X*)=1.
Square Root Law
Let X1, X2, … , Xn be
independent random
variables with the same
distribution as X, and
let Sn be their sum:
Sn = i=1n Xi, and
Sn
X =
n
their average, then:
Weak Law of large numbers
Thm: Let X1, X2, … be a sequence of independent
random variables with the same distribution. Let m
denote the common expected value
X1 +X2 +...+ Xn
.
m = E(Xi). And let Xn =
n
Then for every e > 0:
P(|Xn  m | e )  1 as n  .
Weak Law of large numbers
Proof: Let m = E(Xi) and s = SD(Xi). Then from the
square root law we have:
E(Xn ) =m and SD(Xn ) =
Now Chebyshev inequality gives us:
s
n
.
2
e n s
 s 
P(|Xn  m | e )  P(|Xn  m |
)

s n e n 
For a fixed e right hand side tends to 0 as n
tends to 1.
The Normal Approximation
•Let Sn = X1 + … + Xn be the sum of independent
random variables with the same distribution.
•Then for large n, the distribution of Sn is
approximately normal with mean E(Sn) = n m and
SD(Sn) = s n1/2,
• where m = E(Xi) and s = SD(Xi).
In other words:
Sums of repeated independent random
variables
Suppose Xi represents the number obtained on
the i’th roll of a die.
Then Xi has a uniform distribution on the set
{1,2,3,4,5,6}.
Distribution of X1
0.2
0.1
0.
1
2
3
4
5
6
Sum of two dice
We can obtain the distribution of S2 = X1 +X2
by the convolution formula:
P(S2 = k) = i=1k-1 P(X1=i) P(X2=k-i| X1=i),
by independence
= i=1k-1 P(X1=i) P(X2=k-i).
Distribution of S2
0.2
0.1
0.
2
3
4
5
6
7
8
9
10
11
12
Sum of four dice
We can obtain the distribution of
S4 = X1 + X2 + X3 + X4 = S2 + S’2 again by the
convolution formula:
P(S4 = k) = i=1k-1 P(S2=i) P(S’2=k-i| S2=i),
by independence of S2 and S’2
= i=1k-1 P(S2=i) P(S’2=k-i).
Distribution of S4
0.2
0.1
0.
4
5
6
7
8
9
10
11 12 13
14
15 16
17 18 19 20 21 22 23 24
Distribution of S8
0.1
0.
8
12
16
20
24
28
32
36
40
44
48
Distribution of S16
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
16
24
32
40
48
56
64
72
80
88
96
Distribution of S32
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
32
48
64
80
96
112
128
144
160
176
192
Distribution of X1
0.6
0.5
0.4
0.3
0.2
0.1
0.
1
2
3
Distribution of S2
0.4
0.3
0.2
0.1
0.
2
3
4
5
6
Distribution of S4
0.3
0.2
0.1
0.
4
5
6
7
8
9
10
11
12
Distribution of S8
0.2
0.1
0.
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Distribution of S16
0.15
0.1
0.05
0
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
Distribution of S32
0.1
0.
32
37
42
47
52
57
62
67
72
77
82
87
92
Distribution of X1
0.4
0.3
0.2
0.1
0.
0
1
2
3
4
5
Distribution of S2
0.3
0.2
0.1
0.
0
1
2
3
4
5
6
7
8
9
10
Distribution of S4
0.2
0.1
0.
0
1
2
3
4
5
6
7
8
9
10
11 12
13 14
15 16
17 18
19 20
Distribution of S8
0.1
0.
0
4
8
12
16
20
24
28
32
36
40
Distribution of S16
0.06
0.05
0.04
0.03
0.02
0.01
0
0
8
16
24
32
40
48
56
64
72
80
Distribution of S32
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0
16
32
48
64
80
96
112
128
144
160
Related documents