Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CHAPTER 4
• 4.1 - Discrete Models
 General distributions
 Classical: Binomial, Poisson, etc.
• 4.2 - Continuous Models
 General distributions
 Classical: Normal, etc.
What is the connection between
probability and random variables?
Events (and their corresponding
probabilities) that involve
experimental measurements can
be described by random variables
(e.g., “X = # Males” in previous
gender equity example).
2
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
x1
x2
x3
x6
…etc….
x5
xn
x4
SAMPLE of size n
Pop values
Probabilities
xi
f (xi )
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
Total
1
Data values
Relative Frequencies
xi
f (xi ) = fi /n
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
xk
f (xk)
Total
1
3
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
f (x)
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
Total
1
Total Area = 1
Probability Histogram
f(x) = Probability that the
random variable X is equal
to a specific value x, i.e.,
f(x) = P(X = x)
“probability mass
function” (pmf)
|
x
X
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
f (x)
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
Total
1
Total Area = 1
Probability Histogram
F(x) = Probability that the
random variable X is less
than or equal to a specific
value x, i.e.,
F(x) = P(X  x)
“cumulative distribution
function” (cdf)
|
x
X
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
f (x)
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
Total
1
Calculating probabilities…
Probability Histogram
b

f (x)
P(a  X  b) = ????????
a
= F(b) – F(a)
|
a
|
x
|
b
X
POPULATION
Pop values
Probabilities
x
f (x)
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
Total
1
random variable X
Example: X = Cholesterol level (mg/dL)
Just as the sample mean x and sample variance s2 were used to characterize
“measure of center” and “measure of spread” of a dataset, we can now define
the “true” population mean  and population variance  2, using probabilities.
•
Population mean
   x f (x )
Also denoted by E[X], the “expected value” of the variable X.
•
Population variance
 2   ( x   )2 f ( x )
7
POPULATION
Pop values
Probabilities
x
f (x)
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
Total
1
random variable X
Example: X = Cholesterol level (mg/dL)
Just as the sample mean x and sample variance s2 were used to characterize
“measure of center” and “measure of spread” of a dataset, we can now define
the “true” population mean  and population variance  2, using probabilities.
•
Population mean
   x f (x )
Also denoted by E[X], the “expected value” of the variable X.
•
Population variance
 2   ( x   )2 f ( x )
8
Example 1: POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
1/2
Pop values
Probabilities
xi
f (xi )
210
1/6
240
1/3
270
1/2
Total
1
1/3
1/6
   x f (x )  (210)(1/ 6)  (240)(1/ 3)  (270)(1/ 2)  250
 2   ( x   )2 f ( x)  (40)2 (1/ 6)  (10)2 (1/ 3)  (20)2 (1/ 2)  500
9
Example 2: POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Equally likely outcomes result
in a “uniform distribution.”
Pop values
Probabilities
xi
f (xi )
180
1/3
210
1/3
240
1/3
Total
1
1/3
1/3
1/3
   x f (x )  (180)(1/ 3)  (210)(1/ 3)  (240)(1/ 3)  210 (clear from symmetry)
 2   ( x   )2 f ( x)  (30)2 (1/ 3)  (0)2 (1/ 3)  (30)2 (1/ 3)  600
10
To summarize…
11
POPULATION
Discrete
random variable X
Probability Table
Pop
Probabilities
xi
f (xi )
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
1
Probability Histogram
Total Area = 1
X
   x f (x)
 2   ( x   )2 f ( x )
Frequency Table
Data
xi
x1
x2
x3
x6
x4
…etc….
x5
xn
SAMPLE of size n
Relative
Frequencies
Density Histogram
f (xi ) = fi /n
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
xk
f (xk)
1
Total Area = 1
X
x   x f (x)
s 2  nn1  ( x  x ) 2 f ( x )
12
POPULATION
Continuous
Discrete
random variable X
Probability Table
Pop
Probabilities
xi
f (xi )
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
1
Probability Histogram
Total Area = 1
X
   x f (x)
 2   ( x   )2 f ( x )
Frequency Table
Data
xi
x1
x2
x3
x6
x4
…etc….
x5
xn
SAMPLE of size n
Relative
Frequencies
Density Histogram
f (xi ) = fi /n
x1
f (x1)
x2
f (x2)
x3
f (x3)
⋮
⋮
xk
f (xk)
1
Total Area = 1
X
x   x f (x)
s 2  nn1  ( x  x ) 2 f ( x )
13
One final example…
14
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
f1(x)
1 = 250
x
f2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
Outcomes
(210, 240)
(210, 210), (240, 240)
+30
(210, 180), (240, 210), (270, 240)
+60
(240, 180), (270, 210)
+90
(270, 180)
15
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
f1(x)
1 = 250
x
f2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
Probabilities
Outcomes f(d)
1/9
? 240)
(210,
2/9
? 210), (240, 240)
(210,
+30
3/9
? 180), (240, 210), (270, 240)
(210,
+60
2/9
? 180), (270, 210)
(240,
+90
1/9
? 180)
(270,
The
outcomes of
D are NOT
EQUALLY
LIKELY!!!
16
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
f1(x)
1 = 250
x
f2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
Probabilities
Outcomes f(d)
(1/6)(1/3)
(210, 240)= 1/18 via independence
(210, 210), (240, 240)
+30
(210, 180), (240, 210), (270, 240)
+60
(240, 180), (270, 210)
+90
(270, 180)
17
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
f1(x)
1 = 250
x
f2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
Probabilities f(d)
(1/6)(1/3) = 1/18 via independence
(210, 210),+ (1/3)(1/3)
(1/6)(1/3)
(240, 240)
= 3/18
+30
(210, 180), (240, 210), (270, 240)
+60
(240, 180), (270, 210)
+90
(270, 180)
18
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
f1(x)
1 = 250
x
f2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
Probability Histogram
6/18
5/18
3/18
3/18
1/18
D = X1 – X2 ~ ???
d
-30
0
Probabilities f(d)
(1/6)(1/3) = 1/18 via independence
(1/6)(1/3) + (1/3)(1/3) = 3/18
+30
(210, 180),+ (1/3)(1/3)
(240, 210),
(270, 240)
(1/6)(1/3)
+ (1/2)(1/3)
= 6/18
+60
(240, 180),+ (1/2)(1/3)
(270, 210)
(1/3)(1/3)
= 5/18
+90
(270, 180)= 3/18
(1/2)(1/3)
19
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
Probability Histogram
X2 = Cholesterol level (mg/dL)
x
f1(x)
1 = 250
x
f2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
6/18
5/18
3/18
1/18
D = (-30)(1/18) + (0)(3/18) +
(30)(6/18) + (60)(5/18) +
(90)(3/18) = 40
Probabilities f(d)
D = 1 – 2
(1/6)(1/3) = 1/18 via independence
(1/6)(1/3) + (1/3)(1/3) = 3/18
+30
(210, 180),+ (1/3)(1/3)
(240, 210),
(270, 240)
(1/6)(1/3)
+ (1/2)(1/3)
= 6/18
+60
(240, 180),+ (1/2)(1/3)
(270, 210)
(1/3)(1/3)
= 5/18
+90
(270, 180)= 3/18
(1/2)(1/3)
3/18
D2 = (-70) 2(1/18) + (-40) 2(3/18) +

(-10) 2(6/18) + (20) 2(5/18) +
(50) 2(3/18) = 1100
2 =
2 +
2
D
1
2


20
General: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
IF the two
Probability
Histogram
populations
are
dependent…
X2 = Cholesterol level (mg/dL)
x
f1(x)
1 = 250
210
1/6
12 = 500
240
1/3
f2(x) 2 = 210
…then
this
2
180
1/3still 
formula
holds,
2 = 600
210 BUT……
1/3
270
1/2
240
Total
1
x
1/3
-30
0
5/18
3/18
3/18
1/18
Mean (X1 – X
Total
2) = 1Mean (X1) – Mean (X2)
D = X1 – X2 ~ ???
d
6/18
D = (-30)(1/18) + (0)(3/18) +
(30)(6/18) + (60)(5/18) +
(90)(3/18) = 40
Probabilities f(d)
D = 1 – 2
(1/6)(1/3) = 1/18 via independence
(1/6)(1/3) + (1/3)(1/3) = 3/18
= (-70)
+ Cov
(-40) 2(3/18)
+ )
Var (X1 – X2) = Var (X1) D+2 Var
(X22(1/18)
)
–
2
(X
,
X
2
1
2
2
+30
(210, 180),+ (1/3)(1/3)
(240, 210),
(270, 240)
(1/6)(1/3)
+ (1/2)(1/3)
= 6/18
+60
(240, 180),+ (1/2)(1/3)
(270, 210)
(1/3)(1/3)
= 5/18
These two formulas are valid for
(270, 180)
+90
(1/2)(1/3)
= 3/18
continuous
as well
as discrete distributions.

(-10) (6/18) + (20) (5/18) +
(50) 2(3/18) = 1100
2 =
2 +
2
D
1
2


21
Related documents