Download ps7

Document related concepts
no text concepts found
Transcript
1
Schaum’s Outline
Probability and Statistics
Chapter 7
HYPOTHESIS TESTING
presented by Professor Carol Dahl
Examples by
Alfred Aird
Kira Jeffery
Catherine Keske
Hermann Logsend
Yris Olaya
2
Outline of Topics
Topics Covered
 Statistical Decisions
 Statistical Hypotheses
 Null Hypotheses
 Tests of Hypotheses
 Type I and Type II Errors
 Level of Significance
 Tests Involving the Normal Distribution
 One and Two – Tailed Tests
 P – Value
3
Outline of Topics (Continued)
 Special Tests of Significance
 Large Samples
 Small Samples
 Estimation Theory/Hypotheses Testing Relationship
 Operating Characteristic Curves and Power of a Test
 Fitting Theoretical Distributions to Sample Frequency
Distributions
 Chi-Square Test for Goodness of Fit
4
“The Truth Is Out There”
The Importance of Hypothesis Testing
Hypothesis testing
helps evaluate models based upon real data
enables one to build a statistical model
enhances your credibility as
analyst
economist
5
Statistical Decisions
Innocent until proven guilty principle
Want to prove someone is guilty
Assume the opposite or status quo - innocent
Ho: Innocent
H1: Guilty
Take subsample of possible information
If evidence not consistent with innocent - reject
Person not pronounced innocent but not guilty
6
Statistical Decisions
Status quo innocence = null hypothesis
Evidence = sample result
Reasonable doubt = confidence level
7
Statistical Decisions
Eg. Tantalum ore deposit
feasible if quality > 0.0600g/kg with 99% confidence
100 samples collected from large deposit at random.
Sample distribution
mean of 0.071g/kg
standard deviation 0.0025g/kg.
8
Statistical Decisions
Should the deposit be developed?
Evidence = 0.071 (sample mean)
Reasonable doubt = 99%
Status quo = do not develop the deposit
Ho:  < 0.0600
H1:  > 0.0600
9
Statistical Hypothesis
General Principles
Inferences about population using sample statistic
Prove A is true by assuming it isn’t true
Results of experiment (sample) compared with model
If results of model unlikely, reject model
If results explained by model, do not reject
10
Statistical Hypothesis
Event A fairly likely, model would be retained
Event B unlikely, model would be rejected
Area
B
0
A
z
11
Statistical Decisions
Should the deposit be developed?
Evidence = 0.071 (sample mean)
Reasonable doubt = 99%
Status quo = do not develop the deposit
Ho:  = 0.0600
H1:  > 0.0600
How likely Ho given X = 0.071
12
Need Sampling Statistic
Need statistic with
population parameter
estimate for population parameter
its distribution
13
Need Sampling Statistic
Population Normal - Two Choices
Small Sample <30
Known Variance
X 

n
N(0,1)
Unknown Variance
X 
ŝ
n
tn-1
14
Need Sampling Statistic
Population Not-Normal
Large Sample
Known Variance
X 

n
Unknown Variance
X 
ŝ
n
N(0,1)
N(0,1)
Doesn’t matter if know variance of not
If population is finite sampling no replacement
need adjustment
15
Normal Distribution
X~N(0,1)
 =0
SD=1 (68%)
SD=2 (95%)
SD=3 (99.7%)
27
16
Statistical Decisions
Should the deposit be developed?
Evidence: 0.071 (sample mean)
0.0025g/kg (sample variance)
0.05 (sample standard deviation)
Reasonable doubt = 99%
Status quo = do not develop the deposit
Ho:  = 0.0600
H1:  > 0.0600
One tailed test
How likely Ho given X = 0.071
17
Hypothesis test
Evidence: 0.071 (sample mean)
0.05g/kg (sample standard deviation)
Reasonable doubt = 99%
Status quo = do not develop the deposit
Ho:  = 0.0600
H1:  > 0.0600
X 
P(
 Z c )  0.99  1  
ŝ
n
18
Statistical Hypothesis
Eg. Z = (0.071 – 0.0600)/ (0.05/  100) = 2.2
Conclusion: Don’t reject Ho , don’t develop deposit
2.2 Zc=2.33
19
Null Hypothesis
Hypotheses cannot be proven
reject or fail to reject
based on likelihood of event occurring
null hypothesis is not accepted
20
Test of Hypotheses
Maple Creek Mine and
Potaro Diamond field in Guyana
 Mine potential for producing large diamonds
 Experts want to know true mean carat size produced
True mean said to be 4 carats
Experts want to know if true with 95% confidence
 Random sample taken
Sample mean found to be 3.6 carats
 Based on sample, is 4 carats true mean for mine?
21
Tests of Hypotheses
Tests referred to as:
“Tests of Hypotheses”
“Tests of Significance”
“Rules of Decision”
22
Types of Errors
Ho: µ = 4 (Suppose this is true)
H1: µ  4
Two tailed test
Choose  = 0.05
Sample n = 100 (assume X is normal),  = 1
X4
P( 1.96 
 1.96)  0.95  1  

n
23
Type I error () –reject true
Ho: µ = 4 suppose true
X4
P( 1.96 
 1.96)  0.95  1  

n
/2
/2
24
Type II Error (ß) - Accept False
Ho: µ = 4 not true
µ = 6 true
X-µ not mean 0 but mean 2
ß
μ=4
0
μ=6
2
25
Lower Type I
What happens to Type II
Ho: µ = 4 not true
µ = 6 true
ß
μ=4
μ=6
0
2
26
Higher µ
What happens to Type II?
Ho: µ = 4 not true
µ = 7 true
X-µ not mean 0 but mean 3
ß
μ=4
0
μ=7
3
27
Type I and Type II Errors
Two types of errors can occur in hypothesis testing
To reduce errors, increase sample size when possible
P( Type I Error )  
P( Type II Error )  
Ho True
Ho False
Reject Ho
Type I Error Correct
Decision
Do Not
Reject Ho
Correct
Decision
Type II
Error
28
To Reduce Errors
Increase sample size when possible
Population, n = 5, 10, 20
Mean Sampling
Distributions Difference
Sample Sizes
2.5
2
1.5
1
0.5
0
-4
-2
-0.5 0
2
4
29
Error Examples
Type I Error – rejecting a true null hypothesis
Convicting an innocent person
Rejecting true mean carat size is 4 when it is
Type II Error – not rejecting a false null hypothesis
Setting a guilty person free
Not rejecting mean carat size is 4 when it’s not
30
Level of Significance ()
α = max probability we’re willing to risk Type I Error
= tail area of probability density function
If Type I Error’s “cost” high, choose α low
α defined before hypothesis test conducted
α typically defined as 0.10, 0.05 or 0.01
α = 0.10 for 90% confidence of correct test decision
α = 0.05 for 95% confidence of correct test decision
α = 0.01 for 99% confidence of correct test decision
31
Diamond Hypothesis Test Example
Ho: µ = 4
H1: µ  4
Choose α = 0.01 for 99% confidence
Sample n = 100,  = 1
X = 3.6, -Zc = - 2.575, Zc = 2.575
-2.575
.005
2.575
.005
32
Example Continued
21
X -
3.2  4
z



2
2
1 100
 n
1
- 2 ( z ) not  - 2.575 ( z  2 )
 Observed not “significantly” different from expected
Fail to reject null hypothesis
 We’re 99% confident true mean is 4 carats
33
Tests Involving the t Distribution
Billy Ray has inherited large, 25,000 acre homestead
Located on outskirts of Murfreesboro, Arkansas, near:
Crater of Diamonds State Park
Prairie Creek Volcanic Pipe
Land now used for
agricultural
recreational
No official mining has taken place
34
Case Study in Statistical Analysis
Billy Ray’s Inheritance
Billy Ray must now decide upon land usage
Options:
Exploration for diamonds
Conservation
Land biodiversity and recreation
Agriculture and recreation
Land development
35
Consider Costs and Benefits of Mining
Cost and Benefits of Mining
Opportunity cost
Excessive diamond exploration damages land’s value
Exploration and Mining Costs
Benefit
Value of mineral produced
36
Consider Costs and Benefits of Mining
Cost and Benefits of Mining
Sample for geologic indicators for diamonds
kimberlite or lamporite
larger sample more likely to represent “true population”
larger sample will cost more
37
How to decide one tailed or two tailed
One tailed test
Do we change status quo only if its bigger than null
Do we change status quo only if its smaller than null
Two tailed test
Change status quo if its bigger of if it smaller
38
Tests of Mean
Normal or t
population normal
known variance
Normal
small sample
population normal
unknown variance
small sample
large population
t
Normal
39
Difference Normal and t
0.6
0.5
0.4
0.3
0.2
0.1
0
-5
0
t “fatter” tail than normal bell-curve
5
40
Hypothesis and Sample
Need at least 30 g/m3 mine
Null hypothesis
Ho: µ = 20
Alternative hypothesis H1: ?
Sample data: n=16 (holes drilled)
X close to normal
X =31 g/m³
variance (ŝ2/n)=0.286 g/m³
41
Normal or t?
One tailed
Null hypothesis
Ho: µ = 30
Alternative hypothesis H1: µ > 30
Sample data: n = 16 (holes drilled)
X = 31 g/m³
variance (ŝ2) = 4.29 g/m³ = 4.29
standard deviation ŝ = 2.07
small sample, estimated variance, X close to normal
not exactly t but close if X close to normal
42
Tests Involving the t Distribution
tn-1 = X - µ
ŝ/n
t16-1
 =0
Reject
5%
tc=1.75
43
Tests Involving the t Distribution
tn-1 = X - µ = (31 - 30) = 1.93
ŝ/n
2.07/ 16
t16-1
 =0
Reject
5%
tc=1.75
44
Wells produces oil
X= API Gravity
approximate normal with mean 37
periodically test to see if the mean has changed
too heavy or too light revise contract
Ho:
H1:
Sample of 9 wells, X= 38, ŝ2 = 2
What is test statistic?
Normal or t?
45
Two tailed t test on mean
tn-1 = X - µ
ŝ/n
 =0
Reject
/2%
Reject
/2%
tc
tc
46
Two tailed t test on mean
Ho: µ= 37
H1: µ 37
Sample of 9 wells, X= 38, ŝ2 = 2,  = 10%
tn-1 = X - µ = (38 – 37) = 1.5
ŝ/n
2/  9
47
P-values - one tailed test
Level of significance for a sample statistic under null
Largest  for which statistic would reject null
t16-1 = X - µ = (31 - 30) = 1.93
ŝ/n
2.07/ 16
P=0.04
tinv(1,87,15,1)
48
P-value two tailed test
Ho: µ= 37
H1: µ 37
Sample of 9 wells, X= 38, ŝ2 = 2,  = 10%
tn-1 = X - µ = (38 – 37) = 1.5
ŝ/n
2/  9
=TDIST(1.5,8,2) = 0.172
49
Formal Representation of p-Values
p-Value <  = Reject Ho
p-Value >  = Fail to reject Ho
50
More tests
Survey: - Ranking refinery managers
Daily refinery production
Sample two refineries of 40 and 35 1000 b/cd
First refinery: mean = 74, stand. dev. = 8
Second refinery: mean = 78, stand. dev. = 7
Questions: difference of means?
variances?
differences of variances
Again Statistics Can Help!!!!
51
Differences of Means
Ho: µ1 - µ2 = 0
Ho: µ1 - µ2  0
X1  X 2
2
2
1  2

n1 n 2
X1 and X2 normal, known variance
or large sample known variance
5%
 = 10%
5%
-Zc
Z
52
Differences of Means
Ho: µ1 - µ2 = 0
Ho: µ1 - µ2  0
X1  X 2
74  78

 0.958
2
2
2
2
σ1 σ 2
8
7


n1 n 2
40 35
n1 = 40, n2 = 35
5%
X1 = 74, 1 = 8
X2 = 78, 2 = 7
5%
-Z=-1.645c
Z -1.645
53
Difference of Means
X normal
Unknown but equal variances
Do above test with
t
n1  n 2  2 
X1  X 2
( n1 1 ) ŝ12  ( n 2 1 ) ŝ 22  n1  n 2 


n1  n 2  2
 n1 n 2 
54
Variance test (2 distribution)
( n  1) Ŝ
 
2

2
Two tailed
/2
/2
2
55
Variance test (2 distribution)
( n  1) Ŝ
 
2

2
One tailed

2
56
Hypothesis Test on Variance
Suppose best practice in refinery 2 = 6
Does refinery 2 have different variability than best
practice?
Ho: 2 = 6
H1: 2  6.5
Example: 2nd mine, n –1 = 34, Standard deviation = 7
( n  1) Ŝ
2
P(  
  )  1 
2

2
2
c1
c2
57
Hypothesis Test on Variance
/2
Ho: 2 = (6.5) 2
H1: 2  6.52
Example: 2nd mine, n –1 = 34, Standard deviation = 7
 = 10%
( n  1) Ŝ 2 ( 35  1)7 2

 46.278
2
2

6
2
(
n

1
)
Ŝ
2
P(  2 


)  1 
2

c1
c2
58
Hypothesis Test on Variance
/2
Suppose best practice in refinery
Ho: 2 = 6.5
H1: 2  6.5
Example: 2nd mine, n –1 = 34, Standard deviation = 7
chiinv(0.95,34), chiinv(0.05,34)
21.664,48.603
59
Variance test (2 distribution)
( n  1) Ŝ
 
 46.278
2

2
2
Two tailed
0.05
0.05
21.664
48.602
60
Variance test (2 distribution)
More variance than best practice
Ho: 2 = 6.5
H1: 2 > 6.5
One tailed
0.10
61
Variance test (2 distribution)
More variance than best practice
Ho: 2 = 6.5
H1: 2 > 6.5
One tailed
2
(
n

1
)
Ŝ
2
 
 46.278
2

0.10
chiinv(0.10,34)=44.903
62
Testing if Variances the Same F
Distribution
2 samples of size n1 and n2
sample variances: ŝ12, ŝ22,
Ho: 12 = 22 => Ho: 22/12= 1
Ho: 12  22 => Ho: 22/12  1
Ŝ12
F

2
2
S

2
1

2
2
Ŝ


2
1
2
2
Ŝ 
2
2
2
1
is Fn11,n 21
63
Testing if Variances the Same F
Distribution
Ho: 12/22= 1
H1: 12/22 
Ŝ
1
2
1
Ŝ
Two tailed
/2
/2
2
2
64
Testing if Variances the Same F
Distribution
Ho: 22/12= 1
Ŝ
H1: 22/12>1
2
1
Ŝ
One tailed
=10
2
2
65
Example Testing if Variances the
Same
2 samples of size n1 = 40
and n2 = 35
sample variances: ŝ12= 82, ŝ22 = 72
Ho: 22/12= 1
Ho: 22/12  1 2 2
Ŝ

1
2
P( Finv(0.95, 39, 34) 
[0.579,
82/72=1.306
Ŝ 
2
2
2
1
 Finv(0.05, 39, 34))  1  0.10
1.749]
66
Testing if Variances the Same F
Distribution
Ho: 12/22= 1
H1: 12/22
Ŝ
1
2
1
Ŝ
2
2
 1.306
Two tailed
0.05
Finv(0.95,39,34)=0.579
0.05
Finv(0.05,39,34)=1.749
67
Testing if Variances the Same F
Distribution
Ho: 22/12= 1
H1: 22/12 
Ŝ
1
2
1
Ŝ
2
2
 1.306
One tailed
0.05
Finv(0.10,39,34)=1.544
68
Power of a test
Type II error:
 = P(Fail to reject Ho | H1 is true)
Power = 1- 
μ=4
μ=6
0
2
69
Power of a test
Type II error:
 = P(Fail to reject Ho | H1 is true)
Power = 1- 
μ=4
μ=6
0
2
70
Power of a test
Researcher controls level of significance, 
Increase  what happens to ß?
71
Raise Type I ( )
What happens to Type II (ß)
Ho: µ = 4 not true
µ = 6 true
X-µ not mean 0 but mean 2
ß
μ=4
μ=6
0
2
72
Higher 
What happens to Type II?
μ=4
μ=6
ß
0
Increase ß, reduce 
2
73
Operating Characteristic Curve
μ= μ0
μ=μ1
H0
H1
ß
-10
-5
Zβ
5
Can graph  against 
called operating characteristic curve
useful in experimental design
10
74
Operating Characteristic Curve
μ=μ0
μ=μ1
H0
H1
ß
-10
-5
5
Zβ
μ=μ0
10
μ=μ2
H1
H0
ß
-10
-5
Zβ
5
10
75
Fitting a probability distribution
Is electricity demand a log-normal distribution
Observed Mean: 18.42
Observed Variance 43
Observations : 20
9.8261
20.8787
35.6834
13.1139
15.9879
13.2253
20.2954
18.1785
24.3539
16.4685
30.2449
14.182
20.275
17.243
12.8461
9.2554
23.3099
17.2652
21.9764
13.9045
76
Fitting a probability distribution
Does electricity demand follow a normal distribution?
9.8261
20.8787
35.6834
13.1139
15.9879
13.2253
20.2954
18.1785
24.3539
16.4685
30.2449
14.182
20.275
17.243
12.8461
Observed Mean: 18.42
Observed Variance: 43
Observations : 20
9.2554
23.3099
17.2652
21.9764
13.9045
77
You can test your model graphically:
1. Order observations from smallest Y1 to largest Yn
2. Compute cumulative frequency distribution
3. Plot ordered observations versus Pi
on special probability sheet
4. If straight line within critical range
can’t reject normal
78
You can test your model graphically:
9.26
9.83
12.85
13.11
13.23
13.90
14.18
15.99
16.47
17.24
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
17.27
18.18
20.28
20.30
20.88
21.98
23.31
24.35
30.24
35.68
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
79
Or use the Graph/Probability Plot …
Option in Minitab
80
Statistical test of distribution
Ho: Xe N(µ,2)
H1: Xe does not follow N(µ,2)
Order data
Estimate sample mean & variance
Observed Mean: 18.42
Observed Variance: 43
Observations : 20
2 statistic goodness of fit of model
81
Statistical test of distribution
9.26
9.83
12.85
13.11
13.23
13.90
14.18
15.99
16.47
17.24
17.27
18.18
20.28
20.30
20.88
21.98
23.31
24.35
30.24
35.68
Again order sample
Create m = 5
categories
<10
10-15
15-20
20-25
>25
82
Statistical test of distribution
9.26
9.83
12.85
13.11
13.23
13.90
14.18
15.99
16.47
17.24
17.27
18.18
20.28
20.30
20.88
21.98
23.31
24.35
30.24
35.68
Actual frequencies
<10
2
10-15
5
15-20
5
20-25
6
>25
2
83
Statistical test of distribution
Frequencies
actual expected
<10
15-20
2 Normdist(10,18.42,6.56,1)*20
(Normdist(15,18.42,6.56,1) 5 Normdist(10,18.42,6.56,1)*20
(Normdist(20,18.42,6.56,1)
5 Normdist(15,18.42,6.56,1)*20
20-25
6
>25
2
10-15
84
Statistical test of distribution
Frequencies
Observed
Expected
<10
2
1.99
10-15
5
4.03
15-20
5
5.88
20-25
6
4.94
>25
2
3.16
85
2 Goodness of Fit Test
Is based on:
2 =
m

2
(oi-ei) /ei
i=1
df = m – k – 1
k = number of parameters replaced by estimates
oi: observed frequency, ei: expected frequency
86
Statistical test of distribution
Frequencies
2= (oi-ei)2/ei
oi
ei
<10
2
1.99
+(2-1.99)2/1.99
10-15
5
4.03
+(5-4.03)2/4.03
5.88
+(5-5.88)2/5.88
15-20
5
20-25
6
4.94
>25
2
3.16
+(6-4.94)2/4.94
+(2-3.19)2/3.16
= 1.04
87
Statistical test of distribution
Ho: X  N(µ,2)
H1: X ~ does not follow N(µ,2)
df = m – k – 1= 5 – 2 - 1
2= (oi-ei)2/ei= 1.04
CHIINV(0.05,2)=5.99
88
Outline of Topics (Continued)
Estimation Theory/Hypotheses Testing Relationship
 Operating Characteristic Curves and Power of a Test
 Fitting Theoretical Distributions to Sample Frequency
Distributions
 Chi-Square Test for Goodness of Fit
89
Sum Up Chapter 7
Hypothesis testing
null vs alternative
null with equal sign
null often status quo
alternative often what want to prove
type I error vs type II error
type I called level of significance
P – values
1-ß = power of test
= probability of rejecting false
one tailed vs two tailed
90
Sum Up Chapter 7
Hypothesis tests
mean – Normal test
population normal, known variance
large sample
mean – t test
X 

n
population normal, unknown variance,
X 
small sample
ŝ
n
91
Sum Up Chapter 7
Normal and t
92
Sum Up Chapter 7
Hypothesis tests
difference of means – Normal test
population normal, known variance
X1  X 2
12  22

n1 n 2
93
Sum Up Chapter 7
Hypothesis tests
variance
2
(
n

1
)
Ŝ
2 
2
Are variances equal
Ŝ 
2
1
2
2
Ŝ 
2
2
2
1
is Fn11,n 21
94
Sum Up Chapter 7
2 and F
95
Sum Up Chapter 7
How is random variable distributed
normal – graph cumulative frequency distribution
special paper
straight line
Statistical
2k-m-1= (oi-ei)2/ei
k = categories
m = estimated parameters
always 1 tailed
End of Chapter 7!
96
Related documents