Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Discrete Random Variables and Probability Distributions 3.1 - Random Variables 3.2 - Probability Distributions for Discrete Random Variables 3.3 - Expected Values 3.4 - The Binomial Probability Distribution 3.5 - Hypergeometric and Negative Binomial Distributions 3.6 - The Poisson Probability Distribution What is the connection between probability and random variables? Events (and their corresponding probabilities) that involve experimental measurements can be described by random variables. 2 POPULATION random variable X Example: X = Cholesterol level (mg/dL) x1 x2 x3 x6 …etc…. x5 x4 xn SAMPLE of size n Pop values Probabilities xi p(xi ) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 Data values Relative Frequencies xi p(xi ) = fi /n x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ xk p(xk) Total 1 3 POPULATION random variable X Example: X = Cholesterol level (mg/dL) “Density” f ( x ) p ( x) (height) (area) Probability Histogram p( x) f ( x) x Probabilities x p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 Total Area = 1 p(x) = Probability that the random variable X is equal to a specific value x, i.e., | x x (width) Pop values p(x) = P(X = x) “probability mass function” (pmf) | x X Consider the following discrete random variable… Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)” X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6. Probability Histogram Probability Table x p(x) 1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6 1 Density f(x) P(X = x) Total Area = 1 1 6 1 6 1 6 1 6 1 6 1 6 X “What is the probability of rolling a 4?” p (4) P( X 4) 5 Consider the following discrete random variable… Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)” X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6. Probability Histogram Probability Table x p(x) 1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6 1 Density f(x) P(X = x) Total Area = 1 1 6 1 6 1 6 1 6 1 6 1 6 X “What is the probability of rolling a 4?” p (4) P( X 4) 1 6 6 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Probability Histogram Pop values Probabilities x p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 Total Area = 1 F(x) = Probability that the random variable X is less than or equal to a specific value x, i.e., F(x) = P(X x) “cumulative distribution function” (cdf) | x X Motivation ~ Consider the following discrete random variable… Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)” X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6. Cumulative distribution P(X = x) P(X x) x p(x) F(x) 1 1/6 1/6 2 1/6 2/6 3 1/6 3/6 4 1/6 4/6 5 1/6 5/6 6 1/6 1 1 8 Motivation ~ Consider the following discrete random variable… Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)” X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6. Cumulative distribution P(X = x) P(X x) x p(x) F(x) 1 1/6 1/6 2 1/6 2/6 3 1/6 3/6 4 1/6 4/6 5 1/6 5/6 6 1/6 1 1 “staircase graph” from 0 to 1 9 POPULATION Pop vals pmf x p(x) x1 p(x1) F(x1) = p(x1) x2 p(x2) F(x2) = p(x1) + p(x2) x3 p(x3) F(x3) = p(x1) + p(x2) + p(x3) ⋮ ⋮ ⋮ Total 1 increases from 0 to 1 random variable X Example: X = Cholesterol level (mg/dL) cdf Calculating “interval probabilities”… F(b) = P(X b) F(a–) = P(X a–) F(b) – F(a–) = P(X b) – P(X a–) = P(a X b) b p(x) a | | a–a | b X F(x) = P(X x) POPULATION Pop vals pmf x p(x) x1 p(x1) F(x1) = p(x1) x2 p(x2) F(x2) = p(x1) + p(x2) x3 p(x3) F(x3) = p(x1) + p(x2) + p(x3) ⋮ ⋮ ⋮ Total 1 increases from 0 to 1 random variable X Example: X = Cholesterol level (mg/dL) Calculating “interval probabilities”… F(b) = P(X b) F(a–) = P(X a–) b a cdf f ( x) dx F (b) F (a) b f ( x ) x F ( b ) F ( a ) F(b) – F(a–) = a p( x) P(X b) – P(X a–) = P(a X b) b p(x) a F(x) = P(X x) | | a–a | b X FUNDAMENTAL THEOREM OF CALCULUS (discrete form) POPULATION Pop vals pmf x p(x) x1 p(x1) F(x1) = p(x1) x2 p(x2) F(x2) = p(x1) + p(x2) x3 p(x3) F(x3) = p(x1) + p(x2) + p(x3) ⋮ ⋮ ⋮ Total 1 increases from 0 to 1 random variable X Example: X = Cholesterol level (mg/dL) Calculating “interval probabilities”… F(b) = P(X b) F(a–) = P(X a–) b a cdf f ( x) dx F (b) F (a) b f ( x ) x F ( b ) F ( a ) F(b) – F(a–) = a p( x) P(X b) – P(X a–) = P(a X b) b p(x) a F(x) = P(X x) | | a–a | b X FUNDAMENTAL THEOREM OF CALCULUS (discrete form) Chapter 3 Discrete Random Variables and Probability Distributions 3.1 - Random Variables 3.2 - Probability Distributions for Discrete Random Variables 3.3 - Expected Values 3.4 - The Binomial Probability Distribution 3.5 - Hypergeometric and Negative Binomial Distributions 3.6 - The Poisson Probability Distribution POPULATION Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 random variable X Example: X = Cholesterol level (mg/dL) Just as the sample mean x and sample variance s2 were used to characterize “measure of center” and “measure of spread” of a dataset, we can now define the “true” population mean and population variance 2, using probabilities. • Population mean x p ( x) Also denoted by E[X], the “expected value” of the variable X. • Population variance 2 ( x ) 2 p ( x) 14 POPULATION Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 random variable X Example: X = Cholesterol level (mg/dL) Just as the sample mean x and sample variance s2 were used to characterize “measure of center” and “measure of spread” of a dataset, we can now define the “true” population mean and population variance 2, using probabilities. • Population mean x p ( x) Also denoted by E[X], the “expected value” of the variable X. • Population variance 2 ( x ) 2 p ( x) 15 Example 1: POPULATION random variable X Example: X = Cholesterol level (mg/dL) 1/2 Pop values Probabilities xi p(xi ) 210 1/6 240 1/3 270 1/2 Total 1 1/3 1/6 x p( x) (210)(1/ 6) (240)(1/ 3) (270)(1/ 2) 250 2 2 2 2 ( x )2 p( x) (40) (1/ 6) (10) (1/ 3) (20) (1/ 2) 500 16 Example 2: POPULATION random variable X Example: X = Cholesterol level (mg/dL) Equally likely outcomes result in a “uniform distribution.” Pop values Probabilities xi p(xi ) 180 1/3 210 1/3 240 1/3 Total 1 1/3 1/3 1/3 x p( x) (180)(1/ 3) (210)(1/ 3) (240)(1/ 3) 210 (clear from symmetry) 2 2 2 2 ( x )2 p( x) (30) (1/ 3) (0) (1/ 3) (30) (1/ 3) 600 17 To summarize… 18 POPULATION Discrete random variable X Probability Table Pop Probabilities xi pmf p(xi ) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ 1 Probability Histogram Total Area = 1 X x p( x) 2 ( x ) 2 p ( x) Frequency Table Data xi x1 x2 x3 x6 x4 …etc…. x5 xn SAMPLE of size n Relative Frequencies Density Histogram p(xi ) = fi /n x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ xk p(xk) 1 Total Area = 1 X x x p( x) s 2 nn1 ( x x ) 2 p( x) 19 POPULATION Continuous Discrete random variable X Probability Table Pop Probabilities xi pmf p(xi ) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ 1 Probability Histogram Total Area = 1 X x p( x) 2 ( x ) 2 p ( x) Frequency Table Data xi x1 x2 x3 x6 x4 …etc…. x5 xn SAMPLE of size n Relative Frequencies Density Histogram p(xi ) = fi /n x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ xk p(xk) 1 Total Area = 1 X x x p( x) s 2 nn1 ( x x ) 2 p( x) 20 One final example… 21 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 1 = 250 x p2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 Outcomes (210, 240) (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) 22 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 1 = 250 x p2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 Probabilities Outcomesp(d) 1/9 ? 240) (210, 2/9 ? 210), (240, 240) (210, +30 3/9 ? 180), (240, 210), (270, 240) (210, +60 2/9 ? 180), (270, 210) (240, +90 1/9 ? 180) (270, The outcomes of D are NOT EQUALLY LIKELY!!! 23 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 1 = 250 x p2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 Probabilities Outcomesp(d) (1/6)(1/3) (210, 240)= 1/18 via independence (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) 24 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 1 = 250 x p2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 Probabilities p(d) (1/6)(1/3) = 1/18 via independence (210, 210),+ (1/3)(1/3) (1/6)(1/3) (240, 240) = 3/18 +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) 25 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x p1(x) 1 = 250 x p2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 Probability Histogram 6/18 5/18 3/18 3/18 1/18 D = X1 – X2 ~ ??? d -30 0 Probabilities p(d) (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (210, 180),+ (1/3)(1/3) (240, 210), (270, 240) (1/6)(1/3) + (1/2)(1/3) = 6/18 +60 (240, 180),+ (1/2)(1/3) (270, 210) (1/3)(1/3) = 5/18 +90 (270, 180)= 3/18 (1/2)(1/3) 26 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) Probability Histogram X2 = Cholesterol level (mg/dL) x p1(x) 1 = 250 x p2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 6/18 5/18 3/18 1/18 D = (-30)(1/18) + (0)(3/18) + (30)(6/18) + (60)(5/18) + (90)(3/18) = 40 Probabilities f(d) D = 1 – 2 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (210, 180),+ (1/3)(1/3) (240, 210), (270, 240) (1/6)(1/3) + (1/2)(1/3) = 6/18 +60 (240, 180),+ (1/2)(1/3) (270, 210) (1/3)(1/3) = 5/18 +90 (270, 180)= 3/18 (1/2)(1/3) 3/18 D2 = (-70) 2(1/18) + (-40) 2(3/18) + (-10) 2(6/18) + (20) 2(5/18) + (50) 2(3/18) = 1100 2 = 2 + 2 D 1 2 27 General: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) IF the two Probability Histogram populations are dependent… X2 = Cholesterol level (mg/dL) x f1(x) 1 = 250 210 1/6 12 = 500 240 1/3 f2(x) 2 = 210 …then this 2 180 1/3still formula holds, 2 = 600 210 BUT…… 1/3 270 1/2 240 Total 1 x 1/3 -30 0 5/18 3/18 3/18 1/18 Mean (X1 – X Total 2) = 1Mean (X1) – Mean (X2) D = X1 – X2 ~ ??? d 6/18 D = (-30)(1/18) + (0)(3/18) + (30)(6/18) + (60)(5/18) + (90)(3/18) = 40 Probabilities f(d) D = 1 – 2 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 = (-70) + Cov (-40) 2(3/18) + ) Var (X1 – X2) = Var (X1) D+2 Var (X22(1/18) ) – 2 (X , X 2 1 2 2 +30 (210, 180),+ (1/3)(1/3) (240, 210), (270, 240) (1/6)(1/3) + (1/2)(1/3) = 6/18 +60 (240, 180),+ (1/2)(1/3) (270, 210) (1/3)(1/3) = 5/18 These two formulas are valid for (270, 180) +90 (1/2)(1/3) = 3/18 continuous as well as discrete distributions. (-10) (6/18) + (20) (5/18) + (50) 2(3/18) = 1100 2 = 2 + 2 D 1 2 28 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Mean: X E[ X ] x p( x) Suppose X is transformed to another random variable, say h(X). Then by def, h ( X ) E[h( X )] h( x) p( x) Variance: X2 E ( xXXX))22 p(x ) ( x X ) 2 p( x) 29 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf p(x) bx1 bx2 bx3 p(x1) ⋮ ⋮ Total 1 p(x2) p(x3) General Properties of “Expectation” of X Mean: X E[ X ] x p( x) Suppose X is constant, say b, throughout entire population… Then by def, E[b] b p ( x) b p ( x) b 1 b Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) 30 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf p(x) bx1 bx2 bx3 p(x1) ⋮ ⋮ Total 1 p(x2) p(x3) General Properties of “Expectation” of X Mean: X E[ X ] x p( x) Suppose X is constant, say b, throughout entire population… Then… E[b] b Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) 31 POPULATION random variable X Pop values Probabilities x pmf p(x) a x1 a x2 a x3 Example: X = Cholesterol level (mg/dL) p(x1) p(x2) p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Mean: X E[ X ] x p( x) Multiply X by any constant a… Then by def, E[aX ] a x p( x) a x p ( x) a E[ X ] Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) 32 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf p(x) a x1 a x2 a x3 p(x1) p(x2) p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Mean: X E[ X ] x p( x) Multiply X by any constant a… Then… E[aX ] a E[ X ] i.e.,… a X a X Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) 33 POPULATION Pop values Probabilities x pmf p(x) x1 b random variable X Example: X = Cholesterol level (mg/dL) x2 b x3 b p(x1) p(x2) p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Mean: X E[ X ] x p( x) Multiply X by any constant a… Then… E[aX ] a E[ X ] i.e.,… a X a X Add any constant b to X… ( x b) p( x) x p( x) b p( x) E[ X b] E[ X ] E[b] Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) 34 POPULATION Pop values Probabilities x pmf p(x) x1 b random variable X Example: X = Cholesterol level (mg/dL) x2 b x3 b p(x1) p(x2) p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Mean: X E[ X ] x p( x) Multiply X by any constant a… Add any constant b to X… Then… E[aX ] a E[ X ] E[ X b] E[ X ] b i.e.,… a X a X X b X b Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) 35 POPULATION Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 random variable X Example: X = Cholesterol level (mg/dL) General Properties of “Expectation” of X Mean: X E[ X ] x p( x) E[aX b] a E[ X ] b a X b a X b Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) 36 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) Multiply X by any constant a… then X is also multiplied by a. 2 aX E (aX a X ) 2 E a 2 ( X X ) 2 a 2 E ( X X ) 2 a 2 X2 37 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) Multiply X by any constant a… then X is also multiplied by a. 2 aX a 2 X2 2 i.e.,…Var (aX ) a Var ( X ) aX a X i.e.,…SD(aX ) a SD( X ) 38 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) Add any constant b to X… then b is also added to X . 2 X2 b E ( X b) ( X b) E ( X X ) 2 X2 39 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) Add any constant b to X… then b is also added to X . X2 b X2 i.e.,…Var ( X b) Var ( X ) X b X i.e.,… SD( X b) SD( X ) 40 POPULATION Pop values Probabilities x pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 random variable X Example: X = Cholesterol level (mg/dL) General Properties of “Expectation” of X Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) E X 2 2 X X X 2 E X 2 2E 2X E X XX EX2EX21 E X 2 2 X 2 X 2 E X 2 X 2 41 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x pmf pmf p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ ⋮ Total 1 General Properties of “Expectation” of X Variance: X2 E ( X X ) 2 ( x X ) 2 p( x) X2 E X 2 X 2 x2 p( x) X 2 E X E[ X ] 2 X 2 2 x p( x) x p( x) 2 2 This is the analogue of the “alternate computational formula” for the sample variance s2. 42