Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 4 • 4.1 - Discrete Models General distributions Classical: Binomial, Poisson, etc. • 4.2 - Continuous Models General distributions Classical: Normal, etc. What is the connection between probability and random variables? Events (and their corresponding probabilities) that involve experimental measurements can be described by random variables (e.g., “X = # Males” in previous gender equity example). 2 POPULATION random variable X Example: X = Cholesterol level (mg/dL) x1 x2 x3 x6 …etc…. x5 xn x4 SAMPLE of size n Pop values Probabilities xi f (xi ) x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ Total 1 Data values Relative Frequencies xi f (xi ) = fi /n x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ xk f (xk) Total 1 3 POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x f (x) x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ Total 1 Total Area = 1 Probability Histogram f(x) = Probability that the random variable X is equal to a specific value x, i.e., f(x) = P(X = x) “probability mass function” (pmf) | x X POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x f (x) x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ Total 1 Total Area = 1 Probability Histogram F(x) = Probability that the random variable X is less than or equal to a specific value x, i.e., F(x) = P(X x) “cumulative distribution function” (cdf) | x X POPULATION random variable X Example: X = Cholesterol level (mg/dL) Pop values Probabilities x f (x) x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ Total 1 Calculating probabilities… Probability Histogram b f (x) P(a X b) = ???????? a = F(b) – F(a) | a | x | b X POPULATION Pop values Probabilities x f (x) x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ Total 1 random variable X Example: X = Cholesterol level (mg/dL) Just as the sample mean x and sample variance s2 were used to characterize “measure of center” and “measure of spread” of a dataset, we can now define the “true” population mean and population variance 2, using probabilities. • Population mean x f (x ) Also denoted by E[X], the “expected value” of the variable X. • Population variance 2 ( x )2 f ( x ) 7 POPULATION Pop values Probabilities x f (x) x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ Total 1 random variable X Example: X = Cholesterol level (mg/dL) Just as the sample mean x and sample variance s2 were used to characterize “measure of center” and “measure of spread” of a dataset, we can now define the “true” population mean and population variance 2, using probabilities. • Population mean x f (x ) Also denoted by E[X], the “expected value” of the variable X. • Population variance 2 ( x )2 f ( x ) 8 Example 1: POPULATION random variable X Example: X = Cholesterol level (mg/dL) 1/2 Pop values Probabilities xi f (xi ) 210 1/6 240 1/3 270 1/2 Total 1 1/3 1/6 x f (x ) (210)(1/ 6) (240)(1/ 3) (270)(1/ 2) 250 2 ( x )2 f ( x) (40)2 (1/ 6) (10)2 (1/ 3) (20)2 (1/ 2) 500 9 Example 2: POPULATION random variable X Example: X = Cholesterol level (mg/dL) Equally likely outcomes result in a “uniform distribution.” Pop values Probabilities xi f (xi ) 180 1/3 210 1/3 240 1/3 Total 1 1/3 1/3 1/3 x f (x ) (180)(1/ 3) (210)(1/ 3) (240)(1/ 3) 210 (clear from symmetry) 2 ( x )2 f ( x) (30)2 (1/ 3) (0)2 (1/ 3) (30)2 (1/ 3) 600 10 To summarize… 11 POPULATION Discrete random variable X Probability Table Pop Probabilities xi f (xi ) x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ 1 Probability Histogram Total Area = 1 X x f (x) 2 ( x )2 f ( x ) Frequency Table Data xi x1 x2 x3 x6 x4 …etc…. x5 xn SAMPLE of size n Relative Frequencies Density Histogram f (xi ) = fi /n x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ xk f (xk) 1 Total Area = 1 X x x f (x) s 2 nn1 ( x x ) 2 f ( x ) 12 POPULATION Continuous Discrete random variable X Probability Table Pop Probabilities xi f (xi ) x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ 1 Probability Histogram Total Area = 1 X x f (x) 2 ( x )2 f ( x ) Frequency Table Data xi x1 x2 x3 x6 x4 …etc…. x5 xn SAMPLE of size n Relative Frequencies Density Histogram f (xi ) = fi /n x1 f (x1) x2 f (x2) x3 f (x3) ⋮ ⋮ xk f (xk) 1 Total Area = 1 X x x f (x) s 2 nn1 ( x x ) 2 f ( x ) 13 One final example… 14 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x f1(x) 1 = 250 x f2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 Outcomes (210, 240) (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) 15 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x f1(x) 1 = 250 x f2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 Probabilities Outcomes f(d) 1/9 ? 240) (210, 2/9 ? 210), (240, 240) (210, +30 3/9 ? 180), (240, 210), (270, 240) (210, +60 2/9 ? 180), (270, 210) (240, +90 1/9 ? 180) (270, The outcomes of D are NOT EQUALLY LIKELY!!! 16 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x f1(x) 1 = 250 x f2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 Probabilities Outcomes f(d) (1/6)(1/3) (210, 240)= 1/18 via independence (210, 210), (240, 240) +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) 17 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x f1(x) 1 = 250 x f2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 Probabilities f(d) (1/6)(1/3) = 1/18 via independence (210, 210),+ (1/3)(1/3) (1/6)(1/3) (240, 240) = 3/18 +30 (210, 180), (240, 210), (270, 240) +60 (240, 180), (270, 210) +90 (270, 180) 18 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) X2 = Cholesterol level (mg/dL) x f1(x) 1 = 250 x f2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 Probability Histogram 6/18 5/18 3/18 3/18 1/18 D = X1 – X2 ~ ??? d -30 0 Probabilities f(d) (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (210, 180),+ (1/3)(1/3) (240, 210), (270, 240) (1/6)(1/3) + (1/2)(1/3) = 6/18 +60 (240, 180),+ (1/2)(1/3) (270, 210) (1/3)(1/3) = 5/18 +90 (270, 180)= 3/18 (1/2)(1/3) 19 Example 3: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) Probability Histogram X2 = Cholesterol level (mg/dL) x f1(x) 1 = 250 x f2(x) 2 = 210 210 1/6 12 = 500 180 1/3 22 = 600 240 1/3 210 1/3 270 1/2 240 1/3 Total 1 Total 1 D = X1 – X2 ~ ??? d -30 0 6/18 5/18 3/18 1/18 D = (-30)(1/18) + (0)(3/18) + (30)(6/18) + (60)(5/18) + (90)(3/18) = 40 Probabilities f(d) D = 1 – 2 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 +30 (210, 180),+ (1/3)(1/3) (240, 210), (270, 240) (1/6)(1/3) + (1/2)(1/3) = 6/18 +60 (240, 180),+ (1/2)(1/3) (270, 210) (1/3)(1/3) = 5/18 +90 (270, 180)= 3/18 (1/2)(1/3) 3/18 D2 = (-70) 2(1/18) + (-40) 2(3/18) + (-10) 2(6/18) + (20) 2(5/18) + (50) 2(3/18) = 1100 2 = 2 + 2 D 1 2 20 General: TWO INDEPENDENT POPULATIONS X1 = Cholesterol level (mg/dL) IF the two Probability Histogram populations are dependent… X2 = Cholesterol level (mg/dL) x f1(x) 1 = 250 210 1/6 12 = 500 240 1/3 f2(x) 2 = 210 …then this 2 180 1/3still formula holds, 2 = 600 210 BUT…… 1/3 270 1/2 240 Total 1 x 1/3 -30 0 5/18 3/18 3/18 1/18 Mean (X1 – X Total 2) = 1Mean (X1) – Mean (X2) D = X1 – X2 ~ ??? d 6/18 D = (-30)(1/18) + (0)(3/18) + (30)(6/18) + (60)(5/18) + (90)(3/18) = 40 Probabilities f(d) D = 1 – 2 (1/6)(1/3) = 1/18 via independence (1/6)(1/3) + (1/3)(1/3) = 3/18 = (-70) + Cov (-40) 2(3/18) + ) Var (X1 – X2) = Var (X1) D+2 Var (X22(1/18) ) – 2 (X , X 2 1 2 2 +30 (210, 180),+ (1/3)(1/3) (240, 210), (270, 240) (1/6)(1/3) + (1/2)(1/3) = 6/18 +60 (240, 180),+ (1/2)(1/3) (270, 210) (1/3)(1/3) = 5/18 These two formulas are valid for (270, 180) +90 (1/2)(1/3) = 3/18 continuous as well as discrete distributions. (-10) (6/18) + (20) (5/18) + (50) 2(3/18) = 1100 2 = 2 + 2 D 1 2 21