Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 8.
Jointly Distributed Random Variables
Suppose we have two random variables X and Y defined on a sample space S with probability
measure P( E), E ⊆ S.
Note that S could be the set of outcomes from two ‘experiments’, and the sample space
points be two-dimensional; then perhaps X could relate to the first experiment, and Y to the
second.
Then from before we know to define the marginal probability distributions PX and PY by,
for example,
PX ( B) = P( X −1 ( B)), B ⊆ R.
We now define the joint probability distribution:
Definition 8.0.1. Given a pair of random variables, X and Y, we define the joint probability
distribution PXY as follows:
PXY ( BX , BY ) = P{ X −1 ( BX ) ∩ Y −1 ( BY )},
BX , BY ⊆ R.
So PXY ( BX , BY ), the probability that X ∈ BX and Y ∈ BY , is given by the probability of the set
of all points in the sample space that get mapped both into BX by X and into BY by Y.
More generally, for a single region BXY ⊆ R2 , find the collection of sample space elements
SXY = {s ∈ S|( X (s), Y (s)) ∈ BXY }
and define
PXY ( BXY ) = P(SXY ).
8.0.1 Joint Cumulative Distribution Function
We define the joint cumulative distribution as follows:
Definition 8.0.2. Given a pair of random variables, X and Y, the joint cumulative distribution
function is defined as
FXY ( x, y) = PXY ( X ≤ x, Y ≤ y),
x, y ∈ R.
It is easy to check that the marginal cdfs for X and Y can be recovered by
FX ( x ) = FXY ( x, ∞),
FY (y) = FXY (∞, y),
and that the two definitions will agree.
68
x ∈ R,
y ∈ R,
8.0.2 Properties of Joint CDF FXY
For FXY to be a valid cdf, we need to make sure the following conditions hold.
1. 0 ≤ FXY ( x, y) ≤ 1, ∀ x, y ∈ R;
2. Monotonicity: ∀ x1 , x2 , y1 , y2 ∈ R,
x1 < x2 ⇒ FXY ( x1 , y1 ) ≤ FXY ( x2 , y1 ) and y1 < y2 ⇒ FXY ( x1 , y1 ) ≤ FXY ( x1 , y2 );
3. ∀ x, y ∈ R,
FXY ( x, −∞) = 0, FXY (−∞, y) = 0 and FXY (∞, ∞) = 1.
Y
Suppose we are interested in whether the random variable pair ( X, Y ) lie in the interval
cross product ( x1 , x2 ] × (y1 , y2 ]; that is, if x1 <
X ≤ x2 and y1 < Y ≤ y2 .
✻
y2
y1
x1
x2
✲
X
First note that PXY ( x1 < X ≤ x2 , Y ≤ y) = FXY ( x2 , y) − FXY ( x1 , y).
It is then easy to see that PXY ( x1 < X ≤ x2 , y1 < Y ≤ y2 ) is given by
FXY ( x2 , y2 ) − FXY ( x1 , y2 ) − FXY ( x2 , y1 ) + FXY ( x1 , y1 ).
8.0.3 Joint Probability Mass Functions
Definition 8.0.3. If X and Y are both discrete random variables, then we can define the joint
probability mass function as
f XY ( x, y) = PXY ( X = x, Y = y),
x, y ∈ R.
We can recover the marginal pmfs p X and pY since, by the law of total probability, ∀ x, y ∈ R
f X (x) =
∑ f XY (x, y),
y
f Y (y) =
∑ f XY (x, y).
x
Properties of Joint PMFs
For f XY to be a valid pmf, we need to make sure the following conditions hold.
1. 0 ≤ f XY ( x, y) ≤ 1, ∀ x, y ∈ R;
2.
∑ ∑ f XY (x, y) = 1.
y
x
69
8.0.4 Joint Probability Density Functions
On the other hand, if ∃ f XY : R × R → R s.t.
PXY ( BXY ) =
�
( x,y)∈ BXY
f XY ( x, y)dxdy,
BXY ⊆ R × R,
then we say X and Y are jointly continuous and we refer to f XY as the joint probability
density function of X and Y.
In this case, we have
FXY ( x, y) =
� y
t=−∞
� x
s=−∞
f XY (s, t)dsdt,
x, y ∈ R,
Definition 8.0.4. By the Fundamental Theorem of Calculus we can identify the joint pdf as
∂2
FXY ( x, y).
∂x∂y
f XY ( x, y) =
Furthermore, we can recover the marginal densities f X and f Y :
d
d
FX ( x ) =
FXY ( x, ∞)
dx
dx
� ∞
� x
d
=
f XY (s, y)dsdy.
dx y=−∞ s=−∞
f X (x) =
By the Fundamental Theorem of Calculus, and through a symmetric argument for Y, we thus
get
�
�
f X (x) =
∞
y=−∞
f XY ( x, y)dy,
fY (y) =
∞
x =−∞
f XY ( x, y)dx.
Properties of Joint PDFs
For f XY to be a valid pdf, we need to make sure the following conditions hold.
1. f XY ( x, y) ≥ 0, ∀ x, y ∈ R;
2.
� ∞
y=−∞
� ∞
x =−∞
f XY ( x, y)dxdy = 1.
8.1 Independence and Expectation
8.1.1 Independence
Two random variables X and Y are independent if and only if ∀ BX , BY ⊆ R.,
PXY ( BX , BY ) = PX ( BX )PY ( BY ).
More specifically,
70
Definition 8.1.1. Two random variables X and Y are independent if and only if
f XY ( x, y) = f X ( x ) f Y (y),
∀ x, y ∈ R.
Definition 8.1.2. For two random variables X, Y we define the conditional probability
distribution PY |X by
PY |X ( BY | BX ) =
PXY ( BX , BY )
,
P X ( BX )
BX , BY ⊆ R.
This is the revised probability of Y falling inside BY given that we now know X ∈ BX .
Then we have X and Y are independent ⇐⇒ PY |X ( BY | BX ) = PY ( BY ), ∀ BX , BY ⊆ R.
Definition 8.1.3. For random variables X, Y we define the conditional probability density
function f Y |X by
f XY ( x, y)
, x, y ∈ R.
fY|X (y| x ) =
f X (x)
Note The random variables X and Y are independent ⇐⇒ f Y |X (y| x ) = f Y (y), ∀ x, y ∈ R.
8.1.2 Expectation
Suppose we have a (measurable) bivariate function of interest of the random variables X and
Y, g : R × R → R.
Definition 8.1.4. If X and Y are discrete, we define E{ g( X, Y )} by
EXY { g( X, Y )} =
∑ ∑ g(x, y) pXY (x, y).
y
x
Definition 8.1.5. If X and Y are jointly continuous, we define E{ g( X, Y )} by
EXY { g( X, Y )} =
� ∞
y=−∞
� ∞
x =−∞
g( x, y) f XY ( x, y)dxdy.
Immediately from these definitions we have the following:
• If g( X, Y ) = g1 ( X ) + g2 (Y ),
EXY { g1 ( X ) + g2 (Y )} = EX { g1 ( X )} + EY { g2 (Y )}.
71
• If g( X, Y ) = g1 ( X ) g2 (Y ) and X and Y are independent,
EXY { g1 ( X ) g2 (Y )} = EX { g1 ( X )}EY { g2 (Y )}.
In particular, considering g( X, Y ) = XY for independent X, Y we have
EXY ( XY ) = EX ( X )EY (Y ).
8.1.3 Conditional Expectation
Warning! In general EXY ( XY ) �= EX ( X )EY (Y ).
Suppose X and Y are discrete random variables with joint pmf p( x, y). If we are given the
value x of the random variable X, our revised pmf for Y is the conditional pmf p(y| x ), for
y ∈ R.
Definition 8.1.6. The conditional expectation of Y given X = x is therefore
EY |X (Y | X = x ) =
∑y
y
f ( y | x ).
Similarly,
Definition 8.1.7. If X and Y were continuous,
EY |X (Y | X = x ) =
� ∞
y=−∞
y f (y| x )dy.
In either case, the conditional expectation is a function of x but not the unknown Y.
For a single variable X we considered the expectation of g( X ) = ( X − µ X )( X − µ X ), called
the variance and denoted σX2 .
The bivariate extension of this is the expectation of g( X, Y ) = ( X − µ X )(Y − µY ). We define
the covariance of X and Y by
σXY = Cov( X, Y ) = EXY [( X − µ X )(Y − µY )].
Covariance measures how the random variables move in tandem with one another, and so is
closely related to the idea of correlation.
Definition 8.1.8. We define the correlation of X and Y by
ρ XY = Cor( X, Y ) =
σXY
.
σX σY
Unlike the covariance, the correlation is invariant to the scale of the random variables X
and Y.
It is easily shown that if X and Y are independent random variables, then σXY = ρ XY = 0.
72
8.2 Examples
Example Suppose that the lifetime, X, and brightness, Y of a light bulb are modelled as
continuous random variables. Let their joint pdf be given by
f ( x, y) = λ1 λ2 e−λ1 x−λ2 y ,
x, y > 0.
Question Are lifetime and brightness independent?
Solution
�
Example Suppose continuous random variables ( X, Y ) ∈ R2 have joint pdf
�
√
1, | x | + |y| < 1/ 2
f ( x, y) =
0, otherwise.
Question Determine the marginal pdfs for X and Y.
Solution
�
73