Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Recursive Estimation
Raffaello D’Andrea
Spring 2011
Problem Set:
Probability Review
Last updated: February 28, 2011
Notes:
• Notation: Unless otherwise noted, x, y, and z denote random variables, fx (x) (or the short
hand f (x)) denotes the probability density function of x, and fx|y (x|y) (or f (x|y)) denotes
the conditional probability density function of x conditioned on y. The expected value is
denoted by E [·], the variance is denoted by Var[·], and Pr (Z) denotes the probability that
the event Z occurs.
• Please report any errors found in this problem set to the teaching assistants
(strimpe@ethz.ch or aschoellig@ethz.ch).
Problem Set
Problem 1
Prove f (x|y) = f (x) ⇔ f (y|x) = f (y).
Problem 2
Prove f (x|y, z) = f (x|z) ⇔ f (x, y|z) = f (x|z) f (y|z).
Problem 3
Come up with an example where f (x, y|z) = f (x|z) f (y|z) but f (x, y) ̸= f (x) f (y) for continuous
random variables x, y, and z.
Problem 4
Come up with an example where f (x, y|z) = f (x|z) f (y|z) but f (x, y) ̸= f (x) f (y) for discrete
random variables x, y, and z.
Problem 5
Let x and y be binary random variables: x, y ∈ {0, 1}. Prove that fx,y (x, y) ≥ fx (x) + fy (y) − 1,
which is called Bonferroni’s inequality.
Problem 6
Let y be a scalar continuous random variable with probability density function fy (y), and g
be a function transforming y to a new variable x by x = g(y). Derive the change of variables
formula, i.e. the equation for the probability density function fx (x), under the assumption that
g is continuously differentiable with dg(y)
dy < 0 ∀ y.
Problem 7
Let x ∈ X be a scalar valued continuous random variable with probability density function
fx (x). Prove the law of the unconscious statistician, that is, for a function g(·) show that the
expected value of y = g(x) is given by
∫
E [y] = g(x)fx (x) dx.
X
You may consider the special case where g(·) is a continuously differentiable, strictly monotonically increasing function.
Problem 8
Prove E [ax + b] = aE [x] + b, where a and b are constants.
Problem 9
Let x and y be scalar random variables. Prove that, if x and y are independent, then for any
functions g(·) and h(·), E [g(x)h(y)] = E [g(x)] E [h(y)] .
2
Problem 10
From above, it follows that if x and y are independent, then E [xy] = E [x] E [y]. Is the converse
true, that is, if E [xy] = E [x] E [y], does it imply that x and y are independent? If yes, prove it.
If no, find a counter-example.
Problem 11
Let x ∈ X and y ∈ Y be independent, scalar valued discrete random variables. Let z = x + y.
Prove that
∑
∑
fz (z) =
fx (z − y) fy (y) =
fx (x) fy (z − x) ,
y∈Y
x∈X
that is, the probability density function is the convolution of the individual probability density
functions.
Problem 12
Let x be a scalar continuous random variable that takes on only non-negative values. Prove
that, for a > 0,
E [x]
,
a
which is called Markov’s inequality.
Pr (x ≥ a) ≤
Problem 13
Let x be a scalar continuous random variable. Let x̄ = E [x], σ 2 = Var[x]. Prove that for any
k > 0,
σ2
,
k2
which is called Chebyshev’s inequality.
Pr (|x − x̄| ≥ k) ≤
Problem 14
Let x1 , x2 , . . . , xn be independent random variables, each having a uniform distribution over
[0, 1]. Let the random variable m be defined as the maximum of x1 , . . . , xn , that is, m =
max{x1 , x2 , . . . , xn }. Show that the cumulative distribution function of m, Fm (·), is given by
Fm (m) = mn ,
0 ≤ m ≤ 1.
What is the probability density function of m?
Problem 15
[ ]
Prove that E x2 ≥ (E [x])2 . When does the statement hold with equality?
Problem 16
Suppose that x and y are independent scalar continuous random variables. Show that
∫∞
Pr (x ≤ y) =
Fx (y)fy (y) dy.
−∞
3
Problem 17
Use Chebyshev’s inequality to prove the weak law of large numbers. Namely, if x1 , x2 , . . . are
independent and identically distributed with mean µ and variance σ 2 then, for any ε > 0,
(
)
x1 + x2 + · · · + xn
Pr − µ > ε → 0 as n → ∞.
n
Problem 18
Suppose that x is a random variable with mean 10 and variance 15. What can we say about
Pr (5 < x < 15)?
Problem 19
Let x and y be independent random variables with means µx and µy and variances σx2 and σy2 ,
respectively. Show that
Var[xy] = σx2 σy2 + µ2y σx2 + µ2x σy2 .
Problem 20
a)
Use the method presented in class to obtain a sample x of the exponential distribution
{
λe−λx x ≥ 0
fˆx (x) =
0
x<0
from a given sample u of a uniform distribution.
b)
Exponential random variables are used to model failures of components that do not wear,
for example solid state electronics. This is based on their property
Pr (x > s + t|x > t) = Pr (x > s) ∀ s, t ≥ 0.
A random variable with this property is said to be memoryless. Prove that a random
variable with an exponential distribution satisfies this property.
4
Sample solutions
Problem 1
From the definition of conditional probability, we have f (x, y) = f (x|y)f (y) and f (x, y) =
f (y|x)f (x) and therefore
f (x|y)f (y) = f (y|x)f (x).
(1)
We can assume that f (x) ̸= 0 and f (y) ̸= 0; otherwise f (y|x) and f (x|y), respectively, are not
defined, and the statement under consideration does not make sense.
We will check both “directions” of the equivalence statement:
(1)
f (x|y) = f (x) =⇒ f (x)f (y) = f (y|x)f (x)
f (x)̸=0
=⇒ f (y) = f (y|x)
(1)
f (y|x) = f (y) =⇒ f (x|y)f (y) = f (y)f (x)
f (y)̸=0
=⇒ f (x) = f (x|y)
Therefore, f (x|y) = f (x) ⇔ f (y|x) = f (y).
Problem 2
Using
f (x, y|z) = f (x|y, z)f (y|z)
(2)
both “directions” of the equivalence statement can be shown as follows:
(2)
f (x|y, z) = f (x|z) =⇒ f (x, y|z) = f (x|z)f (y|z)
(2)
f (x, y|z) = f (x|z)f (y|z) =⇒ f (x|y, z)f (y|z) = f (x|z)f (y|z)
=⇒ f (x|y, z) = f (x|z)
since f (y|z) ̸= 0, otherwise f (x|y, z) is not defined.
Problem 3
Let x, y, z ∈ [0, 1] be continuous random variables.
Consider the joint pdf f (x, y, z) = c gx (x, z) gy (y, z) where gx and gy are functions (not necessarily pdfs) and c is a constant such that
∫1 ∫1 ∫1
f (x, y, z) dx dy dz = 1
0 0 0
Then
f (x, y|z) =
c gx (x, z) gy (y, z)
f (x, y, z)
=
,
f (z)
c Gx (z) Gy (z)
with
∫1
Gx (z) =
∫1
gx (x, z) dx, Gy =
0
gy (y, z) dy
0
5
since
∫1 ∫1
∫1
f (z) =
c gx (x, z) gy (y, z) dx dy = c
0 0
Gx (z) gy (y, z) dy = c Gx (z) Gy (z).
0
Furthermore,
f (x, z)
=
f (x|z) =
f (z)
∫1
0
f (x, y, z) dy
c gx (x, z)Gy (z)
gx (x, z)
=
=
f (z)
c Gx (z)Gy (z)
Gx (z)
and, by analogy,
f (y|z) =
gy (y, z)
.
Gy (z)
Continuing the above,
f (x, y|z) =
gx (x, z)gy (y, z)
= f (x|z)f (y|z),
Gx (z)Gy (z)
that is, x and y are conditionally independent.
Now, let gx (x, z) = x + z and gy (y, z) = y + z. Then,
∫1
f (x, y) =
∫1
f (x, y, z) dz = c
0
∫1
0
[
1
1
= c z 3 + (x + y)z 2 + xyz
3
2
∫1
f (x) =
∫1
f (x, y) dx = c
0
0
∫1
(
f (y) =
z 2 + (x + y)z + xy dz
(x + z)(y + z) dz = c
]z=1
(
=c
z=0
0
1 1
+ (x + y) + xy
3 2
1
1 1
+ x + y + xy dx = c
3 2
2
7
f (x, y) dy = c x +
12
(
)
1 1 1
1
+ + y+ y
3 4 2
2
)
(
7
=c y+
12
)
)
(by symmetry).
0
Therefore,
(
f (x)f (y) = c
2
49
7
+ (x + y) + xy
144 12
)
̸= f (x, y),
that is, x and y are not independent.
Problem 4
As in Problem 3, let f (x, y, z) = c gx (x, z)gy (y, z). This ensures that f (x, y|z) = f (x|z)f (y|z).
Let x, y, z ∈ {0, 1}.
6
Define gx and gy by value tables:
x
0
0
1
1
y
0
1
0
1
gx (x, z)
0
1
1
0
y
0
0
1
1
z
0
1
0
1
gy (y, z)
1
0
0
1
x
0
1
f (x)
0.5
0.5
x
0
0
0
0
1
1
1
1
f (x, y, z) = 12 gx (x, z)gy (y, z)
0
0
0
0.5
0.5
0
0
0
y
0
0
1
1
0
0
1
1
z
0
1
0
1
0
1
0
1
y
0
1
f (y)
0.5
0.5
Compute f (x, y), f (x), f (y):
x
0
0
1
1
y
0
1
0
1
f (x, y)
0
0.5
0.5
0
x
0
0
1
1
y
0
1
0
1
f (x)f (y)
0.25
0.25
0.25
0.25
From this we see
f (x, y) ̸= f (x)f (y) .
Remark: For verification, one may also compute f (x, y|z), f (x|z), f (y|z) in order to verify
f (x, y|z) = f (x|z) f (y|z), although this holds by construction of f (x, y, z) and the derivation in
Problem 3.
Problem 5
For all x, y ∈ {0, 1} we have by marginalization,
fx (x) =
∑
fxy (x, y) = fxy (x, 0) + fxy (x, 1) = fxy (x, y) + fxy (x, 1 − y)
y∈{0,1}
fy (y) =
∑
fxy (x, y) = fxy (x, y) + fxy (1 − x, y) .
x∈{0,1}
Therefore,
fx (x) + fy (y) = 2fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y)
= 2fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y) + fxy (1 − x, 1 − y) − fxy (1 − x, 1 − y)
= fxy (x, y) + fxy (x, 1 − y) + fxy (1 − x, y) + fxy (1 − x, 1 − y)
|
{z
}
∑ ∑
=
fxy (x, y) = 1
y∈{0,1} x∈{0,1}
− fxy (1 − x, 1 − y) + fxy (x, y)
= 1 + fxy (x, y) − fxy (1 − x, 1 − y)
≤ 1 + fxy (x, y)
=⇒ fxy (x, y) ≥ fx (x) + fy (y) − 1.
7
Problem 6
Consider the probability that y is in a small interval [ȳ, ȳ + ∆y] (with ∆y > 0 small),
ȳ+∆y
∫
Pr (y ∈ [ȳ, ȳ + ∆y]) =
fy (y) dy,
ȳ
which approaches fy (ȳ) ∆y in the limit as ∆y → 0.
Let x̄ := g(ȳ). For small ∆y, we have by a Taylor series approximation,
g(ȳ + ∆y) ≈ g(∆y) +
dg
(ȳ) ∆y = x̄ + ∆x, ∆x < 0
dy
| {z }
=:∆x
Since g(y) is decreasing, ∆x < 0, see Figure 1.
x̄
g(y)
x̄ + ∆x
ȳ
ȳ + ∆y
Figure 1: Decreasing function g(y) on a small interval [ȳ, ȳ + ∆y].
We are interested in the probability of x ∈ [x̄ + ∆x, x̄]. For small ∆x, this probability is equal
to the probability that y ∈ [ȳ, ȳ + ∆y] (since y ∈ [ȳ, ȳ + ∆y] ⇒ g(y) ∈ [g(ȳ), g(ȳ + ∆y)] ⇒ x ∈
[x̄ + ∆x, x̄] for small ∆y and ∆x).
Therefore, for small ∆y (and thus small ∆x),
Pr (y ∈ [ȳ, ȳ + ∆y]) = fy (ȳ) ∆y = Pr (x ∈ [x̄ + ∆x, x̄]) = −fx (x̄) ∆x
⇒ fy (ȳ) = −fx (x̄) ∆x
fy (ȳ)
⇒ fx (x̄) = ∆x .
− ∆y
As ∆y → 0 (and thus ∆x → 0),
fx (x̄) =
fy (ȳ)
dg
− dy
(ȳ)
∆x
∆y
or fx (x) =
→
dg
dy (ȳ)
fy (y)
dg
− dy
(y)
and therefore
.
Problem 7
From the change of variables formula1 , we have
fy (y) =
1
fx (x)
dg
dx (x)
See lecture.
8
since g is continuously differentiable and strictly monotonic. Furthermore, y = g(x) ⇒ dy =
dg
dx (x)dx.
Let Y = g(X ), that is Y = {y | ∃ x ∈ X : y = g(x)}. Then
∫
E [y] =
∫
yfy (y) dy =
Y
X
fx (x) dg
g(x) dg
(x) dx =
(x) dx
dx
∫
g(x)fx (x) dx.
X
Problem 8
Using the law of the unconscious statistician2 , we have for a discrete random variable x,
E [ax + b] =
∑
(ax + b)fx (x) = a
x∈X
∑
x∈X
|
xfx (x) +b
{z
}
∑
x∈X
|
fx (x) = aE [x] + b
{z
}
1
E[x]
and for a continuous random variable x,
∫∞
E [ax + b] =
∫∞
(ax + b)fx (x) dx = a
−∞
∫∞
xfx (x) dx +b
−∞
|
{z
E[x]
Problem 9
Using the law of the unconscious
statistician3
}
fx (x) dx = aE [x] + b.
−∞
|
{z
}
1
[ ]
x
applied to the vector random variable
with
y
pdf fxy (x, y) we have,
∫∞ ∫∞
E [g(x)h(y)] =
g(x)h(y)fxy (x, y) dx dy
−∞−∞
∫∞ ∫∞
=
g(x)h(y)fx (x) fy (y) dx dy
−∞−∞
∫∞
=
∫∞
g(x)fx (x) dx
−∞
(by independence of x, y)
h(y)fy (y) dy
−∞
= E [g(x)] E [h(y)] .
And similarly for discrete random variables.
Problem 10
We will construct a counter-example.
• Let x be a discrete random variable that takes values -1, 0, and 1 with probabilities 14 , 21 ,
and 14 , respectively.
2
3
See Problem 7.
See Problem 7.
9
• Let w be a discrete random variable taking the values -1, 1 with equal probability. Let x
and w be independent.
• Let y be a discrete random variable defined by y = xw (it follows that y can take values
-1, 0, 1).
By construction, it is clear that y depends on x. We will now show that y and x are not independent, but E [xy] = E [x] E [y].
PDF fxy (x, y):
x
-1
-1
-1
0
0
y
-1
0
1
-1
0
fxy (x, y)
0
1
1
1
1
-1
0
1
0
→ x = −1 (Prob 14 ) ∧ w = +1 (Prob
→ impossible since w ̸= 0
→ x = −1 (Prob 14 ) ∧ w = −1 (Prob
→ impossible
w either ±1, x = 0
..
.
1
8
0
1
8
0
1
2
1
2
)
1
2
)
1
8
0
1
8
Marginalization fx (x), fy (y):
x
-1
0
1
f (x)
0.25
0.5
0.25
y
-1
0
1
f (y)
0.25
0.5
0.25
It follows that fxy (x, y) = fx (x) fy (y) ∀ x, y does not hold. For example, fxy (−1, 0) = 0 ̸=
fx (−1) fy (0).
1
8
=
Expected values:
∑∑
E [xy] =
fxy (x, y)
x
y
1
1
1
1
1
= (−1)(−1) + (−1)(1) + (0)(0) + (1)(−1) + (1)(1)
8
8
2
8
8
=0
∑
E [x] =
xfx (x)
x
1 1
=− + =0
4 4
E [y] = 0
So, for all x, y, E [xy] = E [x] E [y], but not fxy (x, y) = fx (x) fy (y). Hence, we have showed by
counterexample that E [xy] = E [x] E [y] does not imply independence of x and y.
Problem 11
From the definition of the probability density function for discrete random variables,
fz (z̄) = Pr (z = z̄) .
10
The probability that z takes on z̄ is given by the sum of the probabilities of all possibilities of
x and y such that z̄ = x + y, that is all ȳ ∈ Y such that x = z̄ − ȳ and y = ȳ (or, alternatively,
x̄ ∈ X s.t. x = x̄ and y = z̄ − x̄). That is,
∑
fz (z̄) = Pr (z = z̄) =
Pr (x = z̄ − ȳ) · Pr (y = ȳ) (by independence assumption)
ȳ∈Y
=
∑
fx (z̄ − ȳ) fy (ȳ)
ȳ∈Y
Rewriting the last equation yields fz (z) =
Similarly, fz (z) =
∑
∑
fx (z − y) fy (y).
y∈Y
fx (x) fy (z − x) may be obtained.
x∈X
Problem 12
∫∞
∫a
∫∞
E [x] = xf (x) dx = xf (x) dx + xf (x) dx
0
|0
{z
≥0
}
a
∫∞
∫∞
≥ xf (x) dx ≥ af (x) dx
a
a
∫∞
= a f (x) dx = a Pr (x ≥ a)
a
Problem 13
This inequality can be obtained using Markov’s inequality4 :
• (x − x̄)2 is a nonnegative random variable; apply Markov’s inequality with a = k 2 (k ≥ 0):
[
]
2
(
)
E
(x
−
x̄)
Var[x]
σ2
Pr (x − x̄)2 ≥ k 2 ≤
=
=
k2
k2
k2
• Since (x − x̄)2 ≥ k 2 ⇔ |x − x̄| ≥ k, the result follows:
Pr (|x − x̄| ≥ k) ≤
σ2
.
k2
Problem 14
For 0 ≤ t ≤ 1, we write
Fm (t) = Pr (m ∈ [0, t]) = Pr (max{x1 , . . . , xn } ∈ [0, t])
= Pr (xi ∈ [0, t] ∀ i ∈ {1, . . . , n})
= Pr (x1 ∈ [0, t]) · Pr (x2 ∈ [0, t]) · · · Pr (xn ∈ [0, t])
= tn .
4
See Problem 12.
11
By substituting m for t, we obtain the desired result.
∫∞
Since Fm (m) = fm (τ ) dτ , the pdf of m is obtained by differentiating,
0
fm (m) =
dFm
d
(m) =
(mn ) = n · mn−1 .
dm
dm
Problem 15
Let x̄ = E [x]. Then
∫∞
∫∞
2
(x − x̄) f (x) dx =
Var[x] =
−∞
∫∞
x f (x) dx − 2x̄
2
−∞
∫∞
2
xf (x) dx +x̄
−∞
|
{z
}
=x̄
[ ]
[ ]
= E x2 − 2x̄2 + x̄2 = E x2 − x̄2
f (x) dx
−∞
|
{z
}
=1
Since Var[x] ≥ 0, we have
[ ]
E x2 = x̄2 + Var[x] ≥ x̄2 ,
with equality if Var[x] = 0.
Problem 16
For fixed ȳ,
Pr (x ≤ ȳ) = Fx (ȳ) .
The probability that y ∈ [ȳ, ȳ + ∆y] is
ȳ+∆y
∫
Pr (y ∈ [ȳ, ȳ + ∆y]) =
fy (y) dy ≈ fy (ȳ) ∆y
for small ∆y.
ȳ
Therefore, the probability that x ≤ ȳ and y ∈ [ȳ, ȳ + ∆y] is
Pr (x ≤ ȳ ∧ y ∈ [ȳ, ȳ + ∆y]) = Pr (x ≤ ȳ) · Pr (y ∈ [ȳ, ȳ + ∆y]) ≈ Fx (ȳ) fy (ȳ) ∆y
(3)
by independence of x and y.
Now, since we are interested in Pr (x ≤ y), i.e. in the probabilty of x being less than any y and
not just a fixed ȳ, we can sum (3) over all possible intervals [ȳi , ȳi + ∆y]
Pr (∃ ȳi such that x ≤ ȳi ) = lim
N →∞
N
∑
Fx (ȳi ) · fy (ȳi ) · ∆y
i=−N
By letting ∆y → 0, we obtain
∫∞
Pr (x ≤ y) =
Fx (y) fy (y) dy.
−∞
12
Problem 17
Consider the random variable
x1 +x2 +···+xn
.
n
It has the mean
[
]
x1 + x2 + · · · + xn
1
1
E
= (E [x1 ] + · · · + E [xn ]) = (n · µ) = µ
n
n
n
and variance
[(
[
]
)2 ]
x1 + x2 + · · · + xn
x1 + x2 + · · · + xn
Var
=E
−µ
n
n
])
1 ( [
2
2
= 2 E (x1 − µ) + · · · + (xn − µ) + “cross terms”
.
n
The expected value of the cross terms is zero, for example,
E [(xi − µ) (xj − µ)] = E [xi − µ] · E [xj − µ] for i ̸= j, by independence
= 0.
Hence,
[
]
x1 + x2 + · · · + xn
1
nσ 2
σ2
Var
= 2 (Var[x1 ] + . . . Var[xn ]) = 2 =
n
n
n
n
Applying Chebyshev’s inequality with ϵ > 0 gives
(
)
x1 + x2 + · · · + xn σ2
Pr ≥ϵ ≤
−→ 0
n
n · ϵ2 n→0
(
)
x1 + x2 + · · · + xn > ϵ −→ 0
⇒Pr n→0
n
Remark: Changing “≥” to “>” is valid here, since
( x , . . . , x
)
( x , . . . , x
)
( x , . . . , x
)
1
1
1
n
n
n
Pr − µ ≥ ϵ = Pr − µ > ϵ + Pr − µ = 0 .
n
n
n {z
|
}
=0 since
Problem 18
First, we rewrite
Pr (5 < x < 15) = Pr (|x − 10| < 5) = 1 − Pr (|x − 10| ≥ 5) .
Using Chebyshev’s inequality, with x̄ = 10, σ 2 = 15, k = 5, yields
Pr (|x − 10| ≥ 5) ≤
15
15
3
=
= .
2
5
25
5
Therefore, we can say that
Pr (5 < x < 15) ≥ 1 −
3
2
= .
5
5
13
x1 +···+xn
n
is a CRV
Problem 19
First, we compute the mean of xy,
E [xy] = E [x] · E [y] = µx µy .
Hence, by using independence of x and y and the result of Problem 7,
[
]
Var[xy] = E (xy)2 − (µx µy )2
[ ] [ ]
= E x2 E y 2 − µ2x µ2y
[ ]
[ ]
Using E x2 = σx2 + µ2x and E y 2 = σy2 + µ2y , we obtain the result
(
)(
)
Var[xy] = σx2 + µ2x σy2 + µ2y − µx µy
= σx2 σy2 + µ2y σx2 + µ2x σy2 .
Problem 20
a)
The cumulative distribution is
∫x
F̂x (x) =
[
]x
λe−λt dt = −e−λt
t=0
= 1 − e−λx .
0
Let u be a sample from a uniform distribution. According to the method presented in
class, we obtain a sample x by solving u = F̂x (x) for x, that is,
u = 1 − e−λx ⇔ e−λx = 1 − u
⇔ −λx = ln (1 − u)
− ln(1 − u)
.
⇔x=
λ
Notice the following properties:
u → 0 ⇒ x → 0,
u → 1 ⇒ x → +∞.
b)
For s ≥ 0, t ≥ 0,
Pr (x > s + t ∧ x > t)
Pr (x > s + t)
=
Pr (x > t)
Pr (x > t)
[ −λx ]∞
∫ ∞ −λx
−e
λe
dx
e−λ(s+t)
s+t
=
= ∫s+t
=
∞
∞ −λx
e−λt
[−e−λx ]t
dx
t λe
∫∞
= e−λs = λe−λx dx
Pr (x > s + t|x > t) =
s
= Pr (x > s) .
14