Download Sections 1.7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
26
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
1.7 Functions of Random Variables
If X is a random variable with cdf FX (x), then any function of X,
say g(X) = Y is also a random variable. The question then is “what
is the distribution of Y ?”
The function y = g(x) is a mapping from the induced sample space
of the random variable X, X , to a new sample space, Y, of the
random variable Y , that is
g(x) : X → Y.
The inverse mapping g −1 acts from Y to X and we can write
g −1(A) = {x ∈ X : g(x) ∈ A} where A ⊂ Y.
Then, we have
PY (Y ∈ A) = PY (g(X) ∈ A)
= PX {x ∈ X : g(x) ∈ A}
= PX X ∈ g −1 (A) .
The following theorem relates the cumulative distribution functions
of X and Y = g(X).
Theorem 1.7. Let X have cdf FX (x), Y = g(X) and let domain
and codomain of g(X), respectively, be
X = {x : fX (x) > 0}, and Y = {y : y = g(x) x ∈ X }.
(a) If g is an increasing function on X then FY (y) = FX g −1(y)
for y ∈ Y.
(b) If g is a decreasing function on X , then FY (y) = 1−FX g −1 (y)
for y ∈ Y.
1.7. FUNCTIONS OF RANDOM VARIABLES
27
Proof. The cdf of Y = g(X) can be written as
FY (y) = PY (Y ≤ y)
= PY (g(X) ≤ y)
= PX {x ∈ X : g(x) ≤ y}
Z
=
fX (x)dx.
{x∈X :g(x)≤y}
(a) If g is increasing, then
{x ∈ X : g(x) ≤ y} = {x ∈ X : x ≤ g −1 (y)}.
So, we can write
FY (y) =
=
Z
fX (x)dx
{x∈X :x≤g −1 (y)}
Z g−1 (y)
fX (x)dx
g −1(y) .
−∞
= FX
(b) Now, if g is decreasing, then
{x ∈ X : g(x) ≤ y} = {x ∈ X : x ≥ g −1 (y)}.
So, we can write
FY (y) =
=
Z
fX (x)dx
:x≥g −1 (y)}
Z{x∈X
∞
fX (x)dx
= 1 − FX g −1 (y) .
g −1 (y)
28
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
Example 1.12. Find the distribution of Y = g(X) = − log X, where
X ∼ U(0, 1). The cdf of X is

 0, for x ≤ 0;
FX (x) =
x, for 0 < x < 1;

1, for x ≥ 1.
For x ∈ (0, 1) the function g(x) = − log x is defined on Y = (0, ∞)
and it is decreasing.
For y > 0, y = − log x implies that x = e−y , i.e., g −1(y) = e−y and
FY (y) = 1 − FX g −1(y) = 1 − FX e−y = 1 − e−y .
Hence we may write
FY (y) = 1 − e−y I(0,∞) .
This is exponential distribution function for λ = 1.
For continuous rvs we have the following result.
Theorem 1.8. Let X have pdf fX (x) and let Y = g(X), where g is a
monotone function. Suppose that fX (x) is continuous on its support
X = {x : fX (x) > 0} and that g −1 (y) has a continuous derivative
on support Y = {y : y = g(x) f or some x ∈ X }. Then the pdf of
Y is given by
d
fY (y) = fX g −1(y) | g −1 (y)|IY .
dy
1.7. FUNCTIONS OF RANDOM VARIABLES
29
Proof.
d
FY (y)
dy
( d
−1
F
g
(y)
,
if g is increasing;
dy X
=
d
−1
, if g is decreasing.
dy 1 − FX g (y)
(
d −1
fX g −1 (y) dy
g (y), if g is increasing;
d −1
=
−1
−fX g (y) dy g (y), if g is decreasing.
fY (y) =
Note that when g(x) is a decreasing (increasing) so is g −1(y). Hence,
we get the thesis of the theorem.
Example 1.13. Suppose that Z ∼ N (0, 1). What is the distribution
of Y = Z 2?
For Y > 0, the cdf of Y = Z 2 is
FY (y) = PY (Y ≤ y)
= PY (Z 2 ≤ y)
√
√
= PZ (− y ≤ Z ≤ y)
√
√
= FZ ( y) − FZ (− y).
The pdf can now be obtained by differentiation:
d
FY (y)
dy
d
√
√ =
FZ ( y) − FZ (− y)
dy
1
√
1
√
= √ fZ ( y) + √ fZ (− y)
2 y
2 y
fY (y) =
Now, for the standard normal distribution we have
1
2
fZ (z) = √ e−z /2, −∞ < z < ∞.
2π
30
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
This gives,
1 −(√y)2 /2
1
1 −(−√y)2 /2
+√ e
fY (y) = √ √ e
2 y
2π
2π
1 1
= √ √ e−y/2, 0 < y < ∞.
y 2π
This is the pdf function of a chi squared random variable with one
degree of freedom. That is, if Z ∼ N (0, 1), then Z 2 ∼ χ21 .
Note that g(Z) = Z 2 is not a monotone function, but the range of Z,
(−∞, ∞), can be partitioned so that it is monotone on its sub-sets.
1.8 Two-Dimensional Random Variables
Definition 1.8. Let Ω be a sample space and X1 , X2 be functions,
each assigning a real number X1 (ω), X2(ω) to every outcome ω ∈
Ω, that is X1 : Ω → X1 ⊂ R and X2 : Ω → X2 ⊂ R. Then the
pair X = (X1, X2) is called a two-dimensional random variable.
The induced sample space (range) of the two-dimensional random
variable is
X = {(x1, x2) : x1 ∈ X1 , x2 ∈ X2 } ⊂ R2 .
We will denote two-dimensional random variables by bold capital
letters.
Definition 1.9. The cumulative distribution function of a two-dimensional
rv X = (X1 , X2) is
FX (x1, x2) = PX (X1 ≤ x1, X2 ≤ x2)
(1.9)
1.8. TWO-DIMENSIONAL RANDOM VARIABLES
31
1.8.1 Discrete Two-Dimensional Random Variables
If all values of X = (X1, X2) are countable, i.e., the values are in
the range
X = {(x1i, x2j ), i = 1, 2, . . . , j = 1, 2, . . .}
then the variable is discrete. The cdf of a discrete rv X = (X1, X2)
is
X X
FX (x1, x2) =
pX (x1i, x2j )
x2j ≤x2 x1i ≤x1
where pX (x1i, x2j ) denotes the joint probability mass function and
pX (x1i, x2j ) = PX (X1 = x1i, X2 = x2j ).
As in the univariate case, the joint pmf satisfies the following conditions.
1. pX (x1i, x2j ) ≥ 0 , for all i, j
P P
2. X2 X1 pX (x1i, x2j ) = 1
Example 1.14. Consider a discrete bi-variate uniform distribution
defined on a rectangular grid
X = {(x1i, x2j ) : x1i ∈ X1 , x2j ∈ X2 },
where X1 = {x11, . . . , x1n} and X2 = {x21, . . . , x2m}. Then, the
number of values is n × m = N and
pX (x1i, x2j ) =
1
.
N
32
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
Expectations of functions of bivariate random variables are calculated the same way as of the univariate rvs. Let g(x1, x2) be a real
valued function defined on X . Then g(X) = g(X1 , X2) is a rv and
its expectation is
X
E[g(X)] =
g(x1, x2)pX (x1, x2).
X
Example 1.15. Let X1 and X2 be random variables as defined in
Example 1.14 such that X1 = {1, 2, 3, 4} and X2 = {3, 6, 9}. Then,
for example, for g(X1 , X2) = X1 X2 we obtain
E[g(X)] =
1
(1 × 3 + 1 × 6 + 1 × 9 + . . . + 4 × 9) = 15.
12
Marginal pmfs
Each of the components of the two-dimensional rv is a random variable and so we may be interested in calculating its probabilities, for
example PX1 (X1 = x1). Such a uni-variate pmf is then derived in a
context of the distribution of the other random variable. We call it
the marginal pmf.
Theorem 1.9. Let X = (X1 , X2) be a discrete bivariate random
variable with joint pmf pX (x1, x2). Then the marginal pmfs of X1
and X2 , pX1 and pX2 , are given respectively by
pX1 (x1) = PX1 (X1 = x1) =
X
pX (x1, x2) and
X
pX2 (x2) = PX2 (X2 = x2) =
2
X
X1
pX (x1, x2).
1.8. TWO-DIMENSIONAL RANDOM VARIABLES
33
Proof. For X1: Let us denote by Ax1 = {(x1, x2) : x2 ∈ X2 }. Then,
for any x1 ∈ X1 we may write
P (X1 = x1) = P (X1 = x1, x2 ∈ X2 )
= P (X1, X2) ⊆ Ax1
X
=
P (X1 = x1, X2 = x2 )
(x1 ,x2 )∈Ax1
=
X
pX (x1, x2).
X2
For X2 the proof is similar.
Example 1.16. Students in a class of 100 were classified according
to gender (G) and smoking (S) as follows:
Smoking
s
q
n
Gender male 20
32
8
60
female 10
5
25 40
30
37
33 100
where s, q and n denote the smoking status: “now smokes”, “did
smoke but quit” and “never smoked”, respectively. Find the probability that a randomly selected student
1. is a male;
2. is a male smoker;
3. is either a smoker or did smoke but quit;
4. is a female who is a smoker or did smoke but quit.
34
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
Denote by X = (G, S) a two-dimensional rv, where G : Smoking →
{1, 2, 3} and S : Gender → {0, 1}. In the following table we have
the distribution of X as well as marginal distributions of G and of
S.
S
1
2
3 P (G = gi )
G
0 0.20 0.32 0.08 0.60
1 0.10 0.05 0.25 0.40
P (S = sj )
0.30 0.37 0.33 1
Hence, we obtain
1. PG (G = 0) = 0.6.
2. PX (G = 0, S = 1) = 0.2.
3. PS (S = 1) + PS (S = 2) = 0.30 + 0.37 = 0.67.
4. PX (G = 1, S = 1)+PX (G = 1, S = 2) = 0.10+0.05 = 0.15.
1.8.2 Continuous Two-Dimensional Random Variables
If the values of X = (X1, X2 ) are elements of an uncountable set
in the Euclidean plane, then the variable is jointly continuous. For
example the values might be in the range
X = {(x1, x2) : a ≤ x1 ≤ b, c ≤ x2 ≤ d}
for some real a, b, c, d.
1.8. TWO-DIMENSIONAL RANDOM VARIABLES
35
The cdf of a continuous rv X = (X1, X2) is defined as
Z x2 Z x1
FX (x1, x2) = PX (X1 ≤ x1, X2 ≤ x2) =
fX (t1, t2 )dt1dt2 ,
−∞
−∞
(1.10)
where fX (·, ·) is the probability density function such that
1. fX (x1, x2) ≥ 0 for all (x1, x2) ∈ R2
R∞ R∞
2. −∞ −∞ f (x1, x2)dx1dx2 = 1.
The equation (1.10) implies that
∂ 2FX (x1, x2)
= fX (x1, x2).
∂x1∂x2
(1.11)
Also, for any constants a, b, c and d, such that a ≤ b, c ≤ d the
probabilities that the bi-variate rv is in the rectangle (a, b) × (c, d) is
Z dZ b
PX (a ≤ X1 ≤ b, c ≤ X2 ≤ d) =
fX (x1, x2)dx1dx2.
c
a
The marginal pdfs of X1 and X2 are defined similarly as in the discrete case, here using integrals.
Z ∞
fX1 (x1) =
fX (x1, x2)dx2, for − ∞ < x1 < ∞,
−∞
fX2 (x2) =
Z
∞
−∞
fX (x1, x2)dx1,
for − ∞ < x2 < ∞.
Example 1.17. Calculate P X ⊆ A , where A = {(x1, x2) : x1 +
x2 ≥ 1} and the joint pdf of X = (X1 , X2) is defined by
(
6x1x22 for 0 < x1 < 1, 0 < x2 < 1,
fX (x1, x2) =
0 otherwise.
36
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
The probability is a double integral of the pdf over the region A. The
region is however limited by the domain in which the pdf is positive.
We can write
A = {(x1, x2) : x1 + x2 ≥ 1, 0 < x1 < 1, 0 < x2 < 1}
= {(x1, x2) : x1 ≥ 1 − x2, 0 < x1 < 1, 0 < x2 < 1}
= {(x1, x2) : 1 − x2 ≤ x1 < 1, 0 < x2 < 1}.
Hence, the probability is
Z Z
Z 1Z
P (X ⊆ A) =
fX (x1, x2)dx1dx2 =
A
0
1
1−x2
6x1x22dx1dx2 = 0.9
Also, we can calculate marginal pdfs.
Z 1
fX1 (x1) =
6x1x22dx2 = 2x1x32 |10 = 2x1,
0
fX2 (x2) =
Z
1
0
6x1x22dx1 = 3x21x22 |10 = 3x22.
These functions allow us to calculate probabilities involving only
one variable. For example
Z 1
2
3
1
1
2x1dx1 = .
PX1
< X1 <
=
1
4
2
16
4
Analogously to the discrete case, we define the expectation of a real
function g(X) of the bi-variate rv X = (X1, X2) as follows.
Z ∞Z ∞
E(X) =
g(X)fX (x1, x2)dx1dx2.
−∞
−∞
In both cases, discrete and continuous, the following linear property
for the expectation holds.
E[ag(X) + bh(X) + c] = a E[g(X)] + b E[h(X)] + c,
(1.12)
where a, b and c are constants and g and h are some functions of the
bivariate rv X = (X1 , X2).
1.8. TWO-DIMENSIONAL RANDOM VARIABLES
37
1.8.3 Conditional Distributions
Definition 1.10. Let X = (X1 , X2) denote a continuous bivariate
rv with joint pdf fX (x1, x2) and marginal pdfs fX1 (x1) and fX2 (x2).
For any x1 such that fX1 (x1) > 0, the conditional pdf of X2 given
that X1 = x1 is the function of x2 defined by
fX2 |X1 (x2|x1) =
fX (x1, x2)
.
fX1 (x1)
Analogously, we define the conditional p.d.f. of X1 given X2 = x2
fX1 |X2 (x1|x2) =
fX (x1, x2)
.
fX2 (x2)
It is easy to verify that these functions are pdfs. For example, for X2
we can write
Z
Z
fX (x1, x2)
fX2 |X1 (x2|x1)dx2 =
dx2
X2
X2 fX1 (x1)
R
fX (x1, x2)dx2
= X2
fX1 (x1)
fX (x1)
= 1
= 1.
fX1 (x1)
Note that the marginal pdf of X2 (similarly of X1 ) can be written as
Z ∞
fX2 (x2) =
fX2 |X1 (x2|x1)fX1 (x1)dx1.
{z
}
−∞ |
=fX (x1 ,x2 )
Example 1.18. For the random variables defined in Example 1.17
the conditional pdfs are
fX (x1, x2) 6x1x22
fX1 |X2 (x1|x2) =
=
= 2x1
fX2 (x2)
3x22
38
and
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
fX (x1, x2) 6x1x22
fX2 |X1 (x2|x1) =
=
= 3x22.
fX1 (x1)
2x1
The definition of conditional pmf for discrete rvs is analogous to the
case of the continouos rvs.
Example 1.19. Let S and G denote the smoking status an gender as
defined in Exercise 1.16. The probability that a randomly selected
student is a smoker given that the student is a male can be calculated
as
0.20 1
= ,
PS|G (S = 1|G = 0) =
0.60 3
while the probability that a randomly selected student is female,
given that the student smokes can be calculated as
0.10 1
PG|S (G = 1|S = 1s) =
= .
0.30 3
The conditional pdfs allow us to calculate conditional expectations.
The conditional expected value of a function g(X2 ) given that X1 =
x1 is defined by
X

g(x2 )pX2|X1 (x2|x1) for a discrete r.v.,


ZX2
E[g(X2 )|X1 = x1] =



g(x2)fX2|X1 (x2|x1)dx2 for a continuous r.v..
X2
(1.13)
Example 1.20. The conditional mean and variance of the X2 given a
value of X1 , for the variables defined in Example 1.17 are
Z 1
3
E(X2 |X1 = x1) =
x2 3x22dx2 = ,
4
0
1.8. TWO-DIMENSIONAL RANDOM VARIABLES
39
and
var(X2|X1 = x1) = E(X22 |X1 = x1) − [E(X2|X1 = x1 )]2
2
Z 1
3
3
= .
=
x223x22dx2 −
4
80
0
Lemma 1.2. For random variables X and Y defined on support X
and Y, respectively, and for a function g(·) whose expectation exists
the following result holds
E[g(Y )] = E{E[g(Y )|X]}.
Proof. By the definition of conditional expectation we can write
Z
E[g(Y )|X = x] =
g(y)fY |X (y|x)dy.
Y
This is a function of x whose expectation is
Z Z
E{E[g(Y )|X]} =
g(y)fY (y|x)dy fX (x)dx
X
Y
Z Z
=
g(y)fY (y|x)fX (x)dydx
|
{z
}
X Y
=f(X,Y ) (x,y)
=
Z
g(y)
Y
Z
f(X,Y ) (x, y)dxdy
{z
}
|X
=fY (y)
= E[g(Y )].
The following two equalities result from the above lemma.
1. E(Y ) = E{E[Y |X]};
2. var(Y ) = E[var(Y |X)] + var(E[Y |X]).
40
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
1.8.4 Independence of Random Variables
Definition 1.11. Let X = (X1, X2 ) denote a continuous bivariate
rv with joint pdf fX (x1, x2) and marginal pdfs fX1 (x1) and fX2 (x2).
Then X1 and X2 are called independent random variables if, for
every x1 ∈ X1 and x2 ∈ X2
fX (x1, x2) = fX1 (x1)fX2 (x2).
(1.14)
We define independent discrete random variables analogously.
Note: For n random variables X1, . . . , Xn to be mutually independent the condition is
n
Y
fX (x1, . . . , xn) =
fXi (xi),
i=1
for all elements (x1, . . . , xn) of the support of the random variable
X = (X1, . . . , Xn).
If X1 and X2 are independent, then the conditional pdf of X2 given
X1 = x1 is
fX (x1, x2) fX1 (x1)fX2 (x2)
fX2 (x2|x1) =
=
= fX2 (x2)
fX1 (x1)
fX1 (x1)
regardless of the value of x1. Analogous property holds for the conditional pdf of X1 given X2 = x2 .
Example 1.21. It is easy to notice that for the variables defined in
Example 1.17 we have
fX (x1, x2) = 6x1x22 = 2x13x22 = fX1 (x1)fX2 (x2).
So, the variables X1 and X2 are independent.
1.8. TWO-DIMENSIONAL RANDOM VARIABLES
41
In fact, two rvs are independent if and only if there exist functions
g(x1) and h(x2) such that
fX (x1, x2) = g(x1)h(x2)
and the elements of the support X1 do not depend on the elements of
the support X2 (and vice versa).
Theorem 1.10. Let X1 and X2 be independent random variables.
Then
1. For any A ⊂ R and B ⊂ R
P (X1 ⊆ A, X2 ⊆ B) = P (X1 ⊆ A)P (X2 ⊆ B),
that is, {X1 ⊆ A} and {X2 ⊆ B} are independent events.
2. For g(X1 ), a function of X1 only, and for h(X2 ), a function of
X2 only, we have
E[g(X1)h(X2 )] = E[g(X1)] E[h(X2)].
Proof. Assume that X1 and X2 are continuous random variables. To
prove the theorem for discrete rvs we follow the same steps with
sums instead of integrals.
1. We have
P (X1 ⊆ A, X2 ⊆ B) =
=
Z Z
ZB
ZA
fX (x1, x2)dx1dx2
fX1 (x1)fX2 (x2)dx1dx2
Z
Z
=
fX1 (x1)dx1 fX2 (x2)dx2
B
A
Z
Z
fX2 (x2)dx2
=
fX1 (x1)dx1
B
A
A
B
= P (X1 ⊆ A)P (X2 ⊆ B).
42
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
2. Similar arguments as in Part 1 give
Z ∞Z ∞
E[g(X1)h(X2 )] =
g(x1)h(x2)fX (x1, x2)dx1dx2
−∞ −∞
Z ∞Z ∞
=
g(x1)h(x2)fX1 (x1)fX2 (x2)dx1dx2
−∞ −∞
Z ∞ Z ∞
=
g(x1)fX1 (x1)dx1 h(x2)fX2 (x2)dx2
−∞
−∞
Z ∞
Z ∞
=
g(x1)fX1 (x1)dx1
h(x2)fX2 (x2)dx2
−∞
= E[g(X1)] E[h(X2)].
−∞
In the following theorem we will apply this result for the moment
generating function of a sum of independent random variables.
Theorem 1.11. Let X1 and X2 be independent random variables
with moment generating functions MX1 (t) and MX2 (t), respectively.
Then the moment generating function of the sum Y = X1 + X2 is
given by
MY (t) = MX1 (t)MX2 (t).
Proof. By the definition of the mgf and by Theorem 1.10, part 2, we
have
MY (t) = E etY = E et(X1 +X2 )
= E etX1 etX2 = E etX1 E etX2
= MX1 (t)MX2 (t).
Note: The results presented in this section can be easily extended to
any number of mutually independent random variables.
1.8. TWO-DIMENSIONAL RANDOM VARIABLES
43
Example 1.22. Let X1 ∼ N (µ1, σ12) and X2 ∼ N (µ2 , σ22). What is
the distribution of Y = X1 + X2 ?
Using Theorem 1.11 we can write
MY (t) = MX1 (t)MX2 (t)
= exp{µ1 t + σ12 t2 /2} exp{µ2 t + σ22 t2 /2}
= exp{(µ1 + µ2 )t + (σ12 + σ22)t2 /2}.
This is the mgf of a normal rv with E(Y ) = µ1 + µ2 and var(Y ) =
σ12 + σ22. Hence, when X1 and X2 are independently normally distributed, then
X1 + X2 ∼ N (µ1 + µ2 , σ12 + σ22).
Example 1.23. A part of an electronic system has two types of components in joint operation. Denote by X1 and X2 the random length
of life of component of type I and of type II, respectively. The joint
density function of two rvs is given by
x1 + x2
1
fX (x1, x2) = x1 exp −
IX ,
8
2
where X = {(x1, x2) : x1 > 0, x2 > 0}. The engineers are interested in the expected value of so called relative efficiency of the two
components, which is expressed by
X2
E
.
X1
It is easy to see that the two rvs are independent.
1 − 1 x1 − 1 x2 fX (x1, x2) =
x1 e 2
e 2
= g(x1)h(x2).
8
The joint pdf can be written as a product of two functions, one depending on x1 only and the other on x2 only. Also, the support of X1
44
CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS
does not depend on the support of X2 . Hence, the random variables
X1 and X2 are independent. Now, the expectation can be written as
X2
1
=E
E(X2 ).
E
X1
X1
To calculate the expectations we need the marginal pdf of X1 and of
X2 .
The marginal pdf of X1 is.
Z
fX1 (x1) =
∞
1 − 1 (x1 +x2 )
x1 e 2
dx2
8
0
Z ∞
1
1 − 1 x1
−
x
e 2 2 dx2
= x1 e 2
8
0
1
1
= x1e− 2 x1 .
4
Hence, the marginal pdfs of X2 must be
1 1
fX2 (x2) = e− 2 x2 .
2
The expectations are:
Z
1
1 ∞ 1
1
=
E
x1e− 2 x1 dx1
X1
4 0 x1
Z
1 ∞ − 1 x1
1
=
e 2 dx1 = .
4 0
2
Z
1
1 ∞
E(X2) =
x2e− 2 x2 dx2 = 2e−1.
2 0
Finally,
X2
1
E
=E
E(X2 ) = e−1.
X1
X1
Related documents