Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
26 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS 1.7 Functions of Random Variables If X is a random variable with cdf FX (x), then any function of X, say g(X) = Y is also a random variable. The question then is “what is the distribution of Y ?” The function y = g(x) is a mapping from the induced sample space of the random variable X, X , to a new sample space, Y, of the random variable Y , that is g(x) : X → Y. The inverse mapping g −1 acts from Y to X and we can write g −1(A) = {x ∈ X : g(x) ∈ A} where A ⊂ Y. Then, we have PY (Y ∈ A) = PY (g(X) ∈ A) = PX {x ∈ X : g(x) ∈ A} = PX X ∈ g −1 (A) . The following theorem relates the cumulative distribution functions of X and Y = g(X). Theorem 1.7. Let X have cdf FX (x), Y = g(X) and let domain and codomain of g(X), respectively, be X = {x : fX (x) > 0}, and Y = {y : y = g(x) x ∈ X }. (a) If g is an increasing function on X then FY (y) = FX g −1(y) for y ∈ Y. (b) If g is a decreasing function on X , then FY (y) = 1−FX g −1 (y) for y ∈ Y. 1.7. FUNCTIONS OF RANDOM VARIABLES 27 Proof. The cdf of Y = g(X) can be written as FY (y) = PY (Y ≤ y) = PY (g(X) ≤ y) = PX {x ∈ X : g(x) ≤ y} Z = fX (x)dx. {x∈X :g(x)≤y} (a) If g is increasing, then {x ∈ X : g(x) ≤ y} = {x ∈ X : x ≤ g −1 (y)}. So, we can write FY (y) = = Z fX (x)dx {x∈X :x≤g −1 (y)} Z g−1 (y) fX (x)dx g −1(y) . −∞ = FX (b) Now, if g is decreasing, then {x ∈ X : g(x) ≤ y} = {x ∈ X : x ≥ g −1 (y)}. So, we can write FY (y) = = Z fX (x)dx :x≥g −1 (y)} Z{x∈X ∞ fX (x)dx = 1 − FX g −1 (y) . g −1 (y) 28 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS Example 1.12. Find the distribution of Y = g(X) = − log X, where X ∼ U(0, 1). The cdf of X is 0, for x ≤ 0; FX (x) = x, for 0 < x < 1; 1, for x ≥ 1. For x ∈ (0, 1) the function g(x) = − log x is defined on Y = (0, ∞) and it is decreasing. For y > 0, y = − log x implies that x = e−y , i.e., g −1(y) = e−y and FY (y) = 1 − FX g −1(y) = 1 − FX e−y = 1 − e−y . Hence we may write FY (y) = 1 − e−y I(0,∞) . This is exponential distribution function for λ = 1. For continuous rvs we have the following result. Theorem 1.8. Let X have pdf fX (x) and let Y = g(X), where g is a monotone function. Suppose that fX (x) is continuous on its support X = {x : fX (x) > 0} and that g −1 (y) has a continuous derivative on support Y = {y : y = g(x) f or some x ∈ X }. Then the pdf of Y is given by d fY (y) = fX g −1(y) | g −1 (y)|IY . dy 1.7. FUNCTIONS OF RANDOM VARIABLES 29 Proof. d FY (y) dy ( d −1 F g (y) , if g is increasing; dy X = d −1 , if g is decreasing. dy 1 − FX g (y) ( d −1 fX g −1 (y) dy g (y), if g is increasing; d −1 = −1 −fX g (y) dy g (y), if g is decreasing. fY (y) = Note that when g(x) is a decreasing (increasing) so is g −1(y). Hence, we get the thesis of the theorem. Example 1.13. Suppose that Z ∼ N (0, 1). What is the distribution of Y = Z 2? For Y > 0, the cdf of Y = Z 2 is FY (y) = PY (Y ≤ y) = PY (Z 2 ≤ y) √ √ = PZ (− y ≤ Z ≤ y) √ √ = FZ ( y) − FZ (− y). The pdf can now be obtained by differentiation: d FY (y) dy d √ √ = FZ ( y) − FZ (− y) dy 1 √ 1 √ = √ fZ ( y) + √ fZ (− y) 2 y 2 y fY (y) = Now, for the standard normal distribution we have 1 2 fZ (z) = √ e−z /2, −∞ < z < ∞. 2π 30 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS This gives, 1 −(√y)2 /2 1 1 −(−√y)2 /2 +√ e fY (y) = √ √ e 2 y 2π 2π 1 1 = √ √ e−y/2, 0 < y < ∞. y 2π This is the pdf function of a chi squared random variable with one degree of freedom. That is, if Z ∼ N (0, 1), then Z 2 ∼ χ21 . Note that g(Z) = Z 2 is not a monotone function, but the range of Z, (−∞, ∞), can be partitioned so that it is monotone on its sub-sets. 1.8 Two-Dimensional Random Variables Definition 1.8. Let Ω be a sample space and X1 , X2 be functions, each assigning a real number X1 (ω), X2(ω) to every outcome ω ∈ Ω, that is X1 : Ω → X1 ⊂ R and X2 : Ω → X2 ⊂ R. Then the pair X = (X1, X2) is called a two-dimensional random variable. The induced sample space (range) of the two-dimensional random variable is X = {(x1, x2) : x1 ∈ X1 , x2 ∈ X2 } ⊂ R2 . We will denote two-dimensional random variables by bold capital letters. Definition 1.9. The cumulative distribution function of a two-dimensional rv X = (X1 , X2) is FX (x1, x2) = PX (X1 ≤ x1, X2 ≤ x2) (1.9) 1.8. TWO-DIMENSIONAL RANDOM VARIABLES 31 1.8.1 Discrete Two-Dimensional Random Variables If all values of X = (X1, X2) are countable, i.e., the values are in the range X = {(x1i, x2j ), i = 1, 2, . . . , j = 1, 2, . . .} then the variable is discrete. The cdf of a discrete rv X = (X1, X2) is X X FX (x1, x2) = pX (x1i, x2j ) x2j ≤x2 x1i ≤x1 where pX (x1i, x2j ) denotes the joint probability mass function and pX (x1i, x2j ) = PX (X1 = x1i, X2 = x2j ). As in the univariate case, the joint pmf satisfies the following conditions. 1. pX (x1i, x2j ) ≥ 0 , for all i, j P P 2. X2 X1 pX (x1i, x2j ) = 1 Example 1.14. Consider a discrete bi-variate uniform distribution defined on a rectangular grid X = {(x1i, x2j ) : x1i ∈ X1 , x2j ∈ X2 }, where X1 = {x11, . . . , x1n} and X2 = {x21, . . . , x2m}. Then, the number of values is n × m = N and pX (x1i, x2j ) = 1 . N 32 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS Expectations of functions of bivariate random variables are calculated the same way as of the univariate rvs. Let g(x1, x2) be a real valued function defined on X . Then g(X) = g(X1 , X2) is a rv and its expectation is X E[g(X)] = g(x1, x2)pX (x1, x2). X Example 1.15. Let X1 and X2 be random variables as defined in Example 1.14 such that X1 = {1, 2, 3, 4} and X2 = {3, 6, 9}. Then, for example, for g(X1 , X2) = X1 X2 we obtain E[g(X)] = 1 (1 × 3 + 1 × 6 + 1 × 9 + . . . + 4 × 9) = 15. 12 Marginal pmfs Each of the components of the two-dimensional rv is a random variable and so we may be interested in calculating its probabilities, for example PX1 (X1 = x1). Such a uni-variate pmf is then derived in a context of the distribution of the other random variable. We call it the marginal pmf. Theorem 1.9. Let X = (X1 , X2) be a discrete bivariate random variable with joint pmf pX (x1, x2). Then the marginal pmfs of X1 and X2 , pX1 and pX2 , are given respectively by pX1 (x1) = PX1 (X1 = x1) = X pX (x1, x2) and X pX2 (x2) = PX2 (X2 = x2) = 2 X X1 pX (x1, x2). 1.8. TWO-DIMENSIONAL RANDOM VARIABLES 33 Proof. For X1: Let us denote by Ax1 = {(x1, x2) : x2 ∈ X2 }. Then, for any x1 ∈ X1 we may write P (X1 = x1) = P (X1 = x1, x2 ∈ X2 ) = P (X1, X2) ⊆ Ax1 X = P (X1 = x1, X2 = x2 ) (x1 ,x2 )∈Ax1 = X pX (x1, x2). X2 For X2 the proof is similar. Example 1.16. Students in a class of 100 were classified according to gender (G) and smoking (S) as follows: Smoking s q n Gender male 20 32 8 60 female 10 5 25 40 30 37 33 100 where s, q and n denote the smoking status: “now smokes”, “did smoke but quit” and “never smoked”, respectively. Find the probability that a randomly selected student 1. is a male; 2. is a male smoker; 3. is either a smoker or did smoke but quit; 4. is a female who is a smoker or did smoke but quit. 34 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS Denote by X = (G, S) a two-dimensional rv, where G : Smoking → {1, 2, 3} and S : Gender → {0, 1}. In the following table we have the distribution of X as well as marginal distributions of G and of S. S 1 2 3 P (G = gi ) G 0 0.20 0.32 0.08 0.60 1 0.10 0.05 0.25 0.40 P (S = sj ) 0.30 0.37 0.33 1 Hence, we obtain 1. PG (G = 0) = 0.6. 2. PX (G = 0, S = 1) = 0.2. 3. PS (S = 1) + PS (S = 2) = 0.30 + 0.37 = 0.67. 4. PX (G = 1, S = 1)+PX (G = 1, S = 2) = 0.10+0.05 = 0.15. 1.8.2 Continuous Two-Dimensional Random Variables If the values of X = (X1, X2 ) are elements of an uncountable set in the Euclidean plane, then the variable is jointly continuous. For example the values might be in the range X = {(x1, x2) : a ≤ x1 ≤ b, c ≤ x2 ≤ d} for some real a, b, c, d. 1.8. TWO-DIMENSIONAL RANDOM VARIABLES 35 The cdf of a continuous rv X = (X1, X2) is defined as Z x2 Z x1 FX (x1, x2) = PX (X1 ≤ x1, X2 ≤ x2) = fX (t1, t2 )dt1dt2 , −∞ −∞ (1.10) where fX (·, ·) is the probability density function such that 1. fX (x1, x2) ≥ 0 for all (x1, x2) ∈ R2 R∞ R∞ 2. −∞ −∞ f (x1, x2)dx1dx2 = 1. The equation (1.10) implies that ∂ 2FX (x1, x2) = fX (x1, x2). ∂x1∂x2 (1.11) Also, for any constants a, b, c and d, such that a ≤ b, c ≤ d the probabilities that the bi-variate rv is in the rectangle (a, b) × (c, d) is Z dZ b PX (a ≤ X1 ≤ b, c ≤ X2 ≤ d) = fX (x1, x2)dx1dx2. c a The marginal pdfs of X1 and X2 are defined similarly as in the discrete case, here using integrals. Z ∞ fX1 (x1) = fX (x1, x2)dx2, for − ∞ < x1 < ∞, −∞ fX2 (x2) = Z ∞ −∞ fX (x1, x2)dx1, for − ∞ < x2 < ∞. Example 1.17. Calculate P X ⊆ A , where A = {(x1, x2) : x1 + x2 ≥ 1} and the joint pdf of X = (X1 , X2) is defined by ( 6x1x22 for 0 < x1 < 1, 0 < x2 < 1, fX (x1, x2) = 0 otherwise. 36 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS The probability is a double integral of the pdf over the region A. The region is however limited by the domain in which the pdf is positive. We can write A = {(x1, x2) : x1 + x2 ≥ 1, 0 < x1 < 1, 0 < x2 < 1} = {(x1, x2) : x1 ≥ 1 − x2, 0 < x1 < 1, 0 < x2 < 1} = {(x1, x2) : 1 − x2 ≤ x1 < 1, 0 < x2 < 1}. Hence, the probability is Z Z Z 1Z P (X ⊆ A) = fX (x1, x2)dx1dx2 = A 0 1 1−x2 6x1x22dx1dx2 = 0.9 Also, we can calculate marginal pdfs. Z 1 fX1 (x1) = 6x1x22dx2 = 2x1x32 |10 = 2x1, 0 fX2 (x2) = Z 1 0 6x1x22dx1 = 3x21x22 |10 = 3x22. These functions allow us to calculate probabilities involving only one variable. For example Z 1 2 3 1 1 2x1dx1 = . PX1 < X1 < = 1 4 2 16 4 Analogously to the discrete case, we define the expectation of a real function g(X) of the bi-variate rv X = (X1, X2) as follows. Z ∞Z ∞ E(X) = g(X)fX (x1, x2)dx1dx2. −∞ −∞ In both cases, discrete and continuous, the following linear property for the expectation holds. E[ag(X) + bh(X) + c] = a E[g(X)] + b E[h(X)] + c, (1.12) where a, b and c are constants and g and h are some functions of the bivariate rv X = (X1 , X2). 1.8. TWO-DIMENSIONAL RANDOM VARIABLES 37 1.8.3 Conditional Distributions Definition 1.10. Let X = (X1 , X2) denote a continuous bivariate rv with joint pdf fX (x1, x2) and marginal pdfs fX1 (x1) and fX2 (x2). For any x1 such that fX1 (x1) > 0, the conditional pdf of X2 given that X1 = x1 is the function of x2 defined by fX2 |X1 (x2|x1) = fX (x1, x2) . fX1 (x1) Analogously, we define the conditional p.d.f. of X1 given X2 = x2 fX1 |X2 (x1|x2) = fX (x1, x2) . fX2 (x2) It is easy to verify that these functions are pdfs. For example, for X2 we can write Z Z fX (x1, x2) fX2 |X1 (x2|x1)dx2 = dx2 X2 X2 fX1 (x1) R fX (x1, x2)dx2 = X2 fX1 (x1) fX (x1) = 1 = 1. fX1 (x1) Note that the marginal pdf of X2 (similarly of X1 ) can be written as Z ∞ fX2 (x2) = fX2 |X1 (x2|x1)fX1 (x1)dx1. {z } −∞ | =fX (x1 ,x2 ) Example 1.18. For the random variables defined in Example 1.17 the conditional pdfs are fX (x1, x2) 6x1x22 fX1 |X2 (x1|x2) = = = 2x1 fX2 (x2) 3x22 38 and CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS fX (x1, x2) 6x1x22 fX2 |X1 (x2|x1) = = = 3x22. fX1 (x1) 2x1 The definition of conditional pmf for discrete rvs is analogous to the case of the continouos rvs. Example 1.19. Let S and G denote the smoking status an gender as defined in Exercise 1.16. The probability that a randomly selected student is a smoker given that the student is a male can be calculated as 0.20 1 = , PS|G (S = 1|G = 0) = 0.60 3 while the probability that a randomly selected student is female, given that the student smokes can be calculated as 0.10 1 PG|S (G = 1|S = 1s) = = . 0.30 3 The conditional pdfs allow us to calculate conditional expectations. The conditional expected value of a function g(X2 ) given that X1 = x1 is defined by X g(x2 )pX2|X1 (x2|x1) for a discrete r.v., ZX2 E[g(X2 )|X1 = x1] = g(x2)fX2|X1 (x2|x1)dx2 for a continuous r.v.. X2 (1.13) Example 1.20. The conditional mean and variance of the X2 given a value of X1 , for the variables defined in Example 1.17 are Z 1 3 E(X2 |X1 = x1) = x2 3x22dx2 = , 4 0 1.8. TWO-DIMENSIONAL RANDOM VARIABLES 39 and var(X2|X1 = x1) = E(X22 |X1 = x1) − [E(X2|X1 = x1 )]2 2 Z 1 3 3 = . = x223x22dx2 − 4 80 0 Lemma 1.2. For random variables X and Y defined on support X and Y, respectively, and for a function g(·) whose expectation exists the following result holds E[g(Y )] = E{E[g(Y )|X]}. Proof. By the definition of conditional expectation we can write Z E[g(Y )|X = x] = g(y)fY |X (y|x)dy. Y This is a function of x whose expectation is Z Z E{E[g(Y )|X]} = g(y)fY (y|x)dy fX (x)dx X Y Z Z = g(y)fY (y|x)fX (x)dydx | {z } X Y =f(X,Y ) (x,y) = Z g(y) Y Z f(X,Y ) (x, y)dxdy {z } |X =fY (y) = E[g(Y )]. The following two equalities result from the above lemma. 1. E(Y ) = E{E[Y |X]}; 2. var(Y ) = E[var(Y |X)] + var(E[Y |X]). 40 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS 1.8.4 Independence of Random Variables Definition 1.11. Let X = (X1, X2 ) denote a continuous bivariate rv with joint pdf fX (x1, x2) and marginal pdfs fX1 (x1) and fX2 (x2). Then X1 and X2 are called independent random variables if, for every x1 ∈ X1 and x2 ∈ X2 fX (x1, x2) = fX1 (x1)fX2 (x2). (1.14) We define independent discrete random variables analogously. Note: For n random variables X1, . . . , Xn to be mutually independent the condition is n Y fX (x1, . . . , xn) = fXi (xi), i=1 for all elements (x1, . . . , xn) of the support of the random variable X = (X1, . . . , Xn). If X1 and X2 are independent, then the conditional pdf of X2 given X1 = x1 is fX (x1, x2) fX1 (x1)fX2 (x2) fX2 (x2|x1) = = = fX2 (x2) fX1 (x1) fX1 (x1) regardless of the value of x1. Analogous property holds for the conditional pdf of X1 given X2 = x2 . Example 1.21. It is easy to notice that for the variables defined in Example 1.17 we have fX (x1, x2) = 6x1x22 = 2x13x22 = fX1 (x1)fX2 (x2). So, the variables X1 and X2 are independent. 1.8. TWO-DIMENSIONAL RANDOM VARIABLES 41 In fact, two rvs are independent if and only if there exist functions g(x1) and h(x2) such that fX (x1, x2) = g(x1)h(x2) and the elements of the support X1 do not depend on the elements of the support X2 (and vice versa). Theorem 1.10. Let X1 and X2 be independent random variables. Then 1. For any A ⊂ R and B ⊂ R P (X1 ⊆ A, X2 ⊆ B) = P (X1 ⊆ A)P (X2 ⊆ B), that is, {X1 ⊆ A} and {X2 ⊆ B} are independent events. 2. For g(X1 ), a function of X1 only, and for h(X2 ), a function of X2 only, we have E[g(X1)h(X2 )] = E[g(X1)] E[h(X2)]. Proof. Assume that X1 and X2 are continuous random variables. To prove the theorem for discrete rvs we follow the same steps with sums instead of integrals. 1. We have P (X1 ⊆ A, X2 ⊆ B) = = Z Z ZB ZA fX (x1, x2)dx1dx2 fX1 (x1)fX2 (x2)dx1dx2 Z Z = fX1 (x1)dx1 fX2 (x2)dx2 B A Z Z fX2 (x2)dx2 = fX1 (x1)dx1 B A A B = P (X1 ⊆ A)P (X2 ⊆ B). 42 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS 2. Similar arguments as in Part 1 give Z ∞Z ∞ E[g(X1)h(X2 )] = g(x1)h(x2)fX (x1, x2)dx1dx2 −∞ −∞ Z ∞Z ∞ = g(x1)h(x2)fX1 (x1)fX2 (x2)dx1dx2 −∞ −∞ Z ∞ Z ∞ = g(x1)fX1 (x1)dx1 h(x2)fX2 (x2)dx2 −∞ −∞ Z ∞ Z ∞ = g(x1)fX1 (x1)dx1 h(x2)fX2 (x2)dx2 −∞ = E[g(X1)] E[h(X2)]. −∞ In the following theorem we will apply this result for the moment generating function of a sum of independent random variables. Theorem 1.11. Let X1 and X2 be independent random variables with moment generating functions MX1 (t) and MX2 (t), respectively. Then the moment generating function of the sum Y = X1 + X2 is given by MY (t) = MX1 (t)MX2 (t). Proof. By the definition of the mgf and by Theorem 1.10, part 2, we have MY (t) = E etY = E et(X1 +X2 ) = E etX1 etX2 = E etX1 E etX2 = MX1 (t)MX2 (t). Note: The results presented in this section can be easily extended to any number of mutually independent random variables. 1.8. TWO-DIMENSIONAL RANDOM VARIABLES 43 Example 1.22. Let X1 ∼ N (µ1, σ12) and X2 ∼ N (µ2 , σ22). What is the distribution of Y = X1 + X2 ? Using Theorem 1.11 we can write MY (t) = MX1 (t)MX2 (t) = exp{µ1 t + σ12 t2 /2} exp{µ2 t + σ22 t2 /2} = exp{(µ1 + µ2 )t + (σ12 + σ22)t2 /2}. This is the mgf of a normal rv with E(Y ) = µ1 + µ2 and var(Y ) = σ12 + σ22. Hence, when X1 and X2 are independently normally distributed, then X1 + X2 ∼ N (µ1 + µ2 , σ12 + σ22). Example 1.23. A part of an electronic system has two types of components in joint operation. Denote by X1 and X2 the random length of life of component of type I and of type II, respectively. The joint density function of two rvs is given by x1 + x2 1 fX (x1, x2) = x1 exp − IX , 8 2 where X = {(x1, x2) : x1 > 0, x2 > 0}. The engineers are interested in the expected value of so called relative efficiency of the two components, which is expressed by X2 E . X1 It is easy to see that the two rvs are independent. 1 − 1 x1 − 1 x2 fX (x1, x2) = x1 e 2 e 2 = g(x1)h(x2). 8 The joint pdf can be written as a product of two functions, one depending on x1 only and the other on x2 only. Also, the support of X1 44 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS does not depend on the support of X2 . Hence, the random variables X1 and X2 are independent. Now, the expectation can be written as X2 1 =E E(X2 ). E X1 X1 To calculate the expectations we need the marginal pdf of X1 and of X2 . The marginal pdf of X1 is. Z fX1 (x1) = ∞ 1 − 1 (x1 +x2 ) x1 e 2 dx2 8 0 Z ∞ 1 1 − 1 x1 − x e 2 2 dx2 = x1 e 2 8 0 1 1 = x1e− 2 x1 . 4 Hence, the marginal pdfs of X2 must be 1 1 fX2 (x2) = e− 2 x2 . 2 The expectations are: Z 1 1 ∞ 1 1 = E x1e− 2 x1 dx1 X1 4 0 x1 Z 1 ∞ − 1 x1 1 = e 2 dx1 = . 4 0 2 Z 1 1 ∞ E(X2) = x2e− 2 x2 dx2 = 2e−1. 2 0 Finally, X2 1 E =E E(X2 ) = e−1. X1 X1