Download Topic 8 Confidence Intervals - AUEB e

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
ECO 72 - INTRODUCTION TO ECONOMIC
STATISTICS
Topic 8
Confidence
Intervals
These slides are copyright © 2003 by Tavis Barr. This material may
be distributed only subject to the terms and conditions set forth in the
Open Publication License, v1.0 or later (the latest version is presently
available at http://www.opencontent.org/openpub/).
2
Confidence Intervals
Assume a random interval
σ
⎡
I = ⎢X −
,X +
n
⎣
Then,
σ ⎤
⎥
n ⎦
σ
σ ⎤
⎡
P [μ ∈ I] = P ⎢ X −
≤μ ≤ X+
⎥=
n
n⎦
⎣
⎡
⎤
X−μ
= P ⎢ −1 ≤
≤ 1⎥ = P [ −1 ≤ Z ≤ 1] = 0.68
σ/ n
⎣
⎦
since
Z ≈ N ( 0,1)
3
Confidence Intervals
• So for every sample we have probability 68% to
create an interval which includes μ.
• In other words 68% of all the possible intervals
we can create by selecting a different sample
would contain the true value μ.
• If in the previous example n = 50, X = 1 and σ = 1
then I = [0.85, 1.14].
• The true value of μ may or may not be in that
interval.
4
Confidence Intervals
• We are only 68% confident that μ is in that
interval.
• So, we would like to increase confidence thus
bringing the 68% to 90, 95 or even 99%.
• This has a cost because this would also
increase the bounds of the interval.
5
Confidence Intervals
P(­1.645<z<1.645)
=0.9
­2 ­

+
f(z)
z
+2 A normally distributed variable with mean 
and std. deviation  will be between -1.645
 and +1.645 90 percent of the time.
●
This is called the 90 percent confidence
interval.
●
6
Confidence Intervals
P(­1.96<z<1.96)
=0.95
­2 ­

+
f(z)
z
+2 A normally distributed variable with mean 
and standard deviation  will be between 1.96 and +1.96 95 percent of the time.
●
This is called the 95 percent confidence
interval.
●
7
Confidence Intervals
P(­2.576 < z < 2.576)
=0.99
­2 ­

+
f(z)
z
+2 A normally distributed variable with mean 
and std deviation  will be between -2.576
and +2.576 99 percent of the time.
●
This is called the 99 percent confidence
interval.
●
8
Example of a Confidence Interval
●
A survey of 2,938 clients of homeless service
programs found that the average client earned
$367 per month, with a std. deviation of $354.
Source: http://www.huduser.org/publications/homeless/homelessness/ch_2e.html
●
Standard error: 354/29380.5 = 354/54.2 = 6.53
Lower
90%
95%
99%
Bound of C.I.
367 – 1.645(6.53) = 356.26
367 – 1.960(6.53) = 354.20
367 – 2.576(6.53) = 350.18
Upper Bound of C.I.
367 + 1.645(6.53) = 377.74
367 + 1.960(6.53) = 379.80
367 + 2.576(6.53) = 383.82
90% ­ 356.25 to 377.74
95%: 354.20 to 379.80
99%: 350.18 to 383.82
350
360
370
380
390
9
Example of a Confidence Interval
●
●
Sample of 300 households. Mean meat
consumption is 0.4 lbs/day, std. deviation is 0.2.
Standard error is 0.2/3000.5 = 0.2/17.32=0.023
Lower Bound of C.I.
90% 0.4 – 1.645(0.023) = 0.362
95% 0.4 – 1.960(0.023) = 0.355
99% 0.4 – 2.576(0.023) = 0.341
Upper Bound of C.I.
0.4 + 1.645(0.023) = 0.438
0.4 + 1.960(0.023) = 0.445
0.4 + 2.576(0.023) = 0.459
90%: 0.362 to 0.438
95%: 0.355 to 0.445
99%: 0.341 to 0.459
0.3
0.35
0.4
0.45
0.5
10
How do we handle small
samples?
●
●
●
The Central Limit Theorem requires
the sample size to be over 30
If the original variable is normally
distributed, the sample mean will
follow the t distribution
Even if it isn't, pretending it is may
give us some guidance
11
How do we handle small
samples?
●
●
●
For large samples, we multiply the standard error
by the same number for all sample sizes:
90%
1.645
95%
1.96
99%
2.576
For the t distribution, we use a different number
depending on how many observations there are
If our sample has n observations, then we use the
t distribution with n-1 degrees of freedom
12
How do we handle small
samples?
●
●
Example:
–
A sample of students has the following test
scores: 84, 76, 98, 34, 65, 76, 90, 92, 64, 87.
–
What is a 90% confidence interval for the
population mean?
We start by calculating the sample mean,
sample standard deviation, and standard
error the same way
13
How do we handle small
samples?
●
Sample mean is 76.6
●
Sample s.d. Is 18.7
●
●
So standard error is
18.7/  10=5.91
There are 10
observations so we
use 10-1 = 9 degrees
of freedom
Confidence Intervals
80%
90%
95%
Level of Significance for One­Tailed Test
df
0.100 0.050 0.025
Level of Significance for Two­Tailed Test
0.20
0.10
0.05
1
3.08
6.31
12.71
2
1.89
2.920
4.3
3
1.64
2.35
3.18
4
1.53
2.13
2.78
5
1.48
2.02
2.57
98%
99%
99.9%
0.010
0.005
0.0005
0.02
31.82
6.97
4.54
3.75
3.37
0.01
0.001
63.657 636.619
9.925 31.599
5.841 12.924
4.604 8.610
4.032 6.869
6
7
8
9
10
1.440
1.42
1.4
1.38
1.37
1.943
1.895
1.860
1.833
1.812
2.45
2.37
2.31
2.26
2.23
3.14
3
2.87
2.82
2.76
3.707
3.499
3.355
3.250
3.169
5.959
5.408
5.041
4.781
4.587
11
12
13
14
15
1.36
1.36
1.350
1.35
1.34
1.796
1.782
1.771
1.761
1.753
2.2
2.18
2.160
2.15
2.13
2.72
2.68
2.650
2.62
2.6
3.106
3.055
3.012
2.977
2.947
4.437
4.318
4.221
4.140
4.073
14
How do we handle small
samples?
●
X=76.6, SE=5.91
●
9 degrees of Freedom
●
90% Confidence
interval:
X−1.83×SE to
X1.83×SE
= 76.6 – 1.83(5.91) to
76.6 + 1.83(5.91)
= 65.78 to 87.42
Confidence Intervals
80%
90%
95%
Level of Significance for One­Tailed Test
df
0.100 0.050 0.025
Level of Significance for Two­Tailed Test
0.20
0.10
0.05
1
3.08
6.31
12.71
2
1.89
2.920
4.3
3
1.64
2.35
3.18
4
1.53
2.13
2.78
5
1.48
2.02
2.57
98%
99%
99.9%
0.010
0.005
0.0005
0.02
31.82
6.97
4.54
3.75
3.37
0.01
0.001
63.657 636.619
9.925 31.599
5.841 12.924
4.604 8.610
4.032 6.869
6
7
8
9
10
1.440
1.42
1.4
1.38
1.37
1.943
1.895
1.860
1.833
1.812
2.45
2.37
2.31
2.26
2.23
3.14
3
2.87
2.82
2.76
3.707
3.499
3.355
3.250
3.169
5.959
5.408
5.041
4.781
4.587
11
12
13
14
15
1.36
1.36
1.350
1.35
1.34
1.796
1.782
1.771
1.761
1.753
2.2
2.18
2.160
2.15
2.13
2.72
2.68
2.650
2.62
2.6
3.106
3.055
3.012
2.977
2.947
4.437
4.318
4.221
4.140
4.073
15
16
Another Small Sample Example
The longevity of 7 patients with a rare
cancer after metastasis:
29
67
65
42
33
97
56
weeks
weeks
weeks
weeks
weeks
weeks
weeks
What is a 95% confidence interval for
the average longevity in the population?
17
Another Small Sample Example
The longevity of 7 patients with a rare
cancer after metastasis:
29 weeks
67 weeks
65 weeks
42 weeks
33 weeks
97 weeks
56 weeks
Sum:
389 weeks
Mean: 55.57
X i− X
-26.57
11.42
9.42
-13.57
-22.57
41.42
0.42
 Xi− X 
2
706.04
130.61
88.90
184.18
509.47
1716.33
0.18
Sum: 3335.71
Variance: 555.95
Std Dev: 23.58
Std Err:
23.58
=8.91
7
What is a 95% confidence interval for
the average longevity in the population?
18
Another Small Sample Example
Longevity of 7
patients with a rare
cancer after
metastasis
Sample Mean: 55.57
Std Error:
8.91
dof:
6
95% confidence
interval:
55.57 ± 2.45(8.91)
= 33.74 to 77.40
Confidence Intervals
80%
90%
95%
Level of Significance for One­Tailed Test
df
0.100 0.050 0.025
Level of Significance for Two­Tailed Test
0.20
0.10
0.05
1
3.08
6.31
12.71
2
1.89
2.920
4.3
3
1.64
2.35
3.18
4
1.53
2.13
2.78
5
1.48
2.02
2.57
98%
99%
99.9%
0.010
0.005
0.0005
0.02
31.82
6.97
4.54
3.75
3.37
0.01
0.001
63.657 636.619
9.925 31.599
5.841 12.924
4.604 8.610
4.032 6.869
6
7
8
9
10
1.440
1.42
1.4
1.38
1.37
1.943
1.895
1.860
1.833
1.812
2.45
2.37
2.31
2.26
2.23
3.14
3
2.87
2.82
2.76
3.707
3.499
3.355
3.250
3.169
5.959
5.408
5.041
4.781
4.587
11
12
13
14
15
1.36
1.36
1.350
1.35
1.34
1.796
1.782
1.771
1.761
1.753
2.2
2.18
2.160
2.15
2.13
2.72
2.68
2.650
2.62
2.6
3.106
3.055
3.012
2.977
2.947
4.437
4.318
4.221
4.140
4.073
19
●
Confidence Interval for
Population Proportion
A proportion is simply the fraction of
responses in a dataset that equal a certain
number
–
For a dummy (0/1, yes/no) variable, the
fraction of “yes” or “1”
–
For a category variable, e.g., the brand of car
that a respondent drives, what percent drive a
Buick?
–
For a more general discrete variable, e.g.,
what percentage of people have exactly two
children?
20
●
Confidence Interval for
Population proportion
All proportion variables can be thought of
or re-cast as dummy variables
–
1 for “Drives a Buick”
0 for “Doesn't drive a a Buick”
–
1 for “Has exactly two children”
0 for “Doesn't have exactly two children
Confidence Interval for
Population Proportion
21
●
●
●
Consider the question: “In a sample
of a dummy variable size n, what is
the probability that we observe the
value “1” k times?
This is a Binomial probability
So a sample proportion is basically a
Binomial variable divided by n
22
Confidence Interval for
Population Proportion
●
●
●
A sample proportion is basically a Binomial
variable divided by n
Remember that as n gets big, a Binomial
variable approximates a Normal with mean
np and standard deviation  np 1−p
So if we divide the variable by n, we get a
Normal variable with mean p and standard
deviation
 np1−p =
n

p1−p
n
23
●
●
●
●
Confidence Interval for
Population Proportion
If we divide the variable by n, we get a
Normal variable with mean p and standard
deviation  np 1−p/n=  p1−p/n
So the expected value of the sample
proportion is the population proportion,
and its standard error is  p1−p/ n
We can use this expected value and
standard error to generate confidence
intervals
Requirement: np ≥ 5 and np(1-p) ≥ 5
24
●
●
●
Confidence Interval for
Population Proportion
Example: Suppose we decide that only 1% of our
televisions should break within a year.
We do a survey of 500 consumers and find that 8
have broken within the first year.
What is the 90% confidence interval for the
proportion that break within a year?
25
●
Confidence Interval for
Population Proportion
Example: A Zogby poll of 2,246 adults found that
83% think text messaging while driving should be
illegal.
Source: http://www.zogby.com/news/ReadNews.dbm?ID=1323
●
What is a 90 percent confidence interval for the
fraction of adults that thinks text messaging
while driving should be illegal?
26
●
●
●
Confidence Interval for
Population Proportion
Example: Suppose we decide that only 1% of our
televisions should break within a year.
We do a survey of 500 consumers and find that 8
have broken within the first year.
What is the 90% confidence interval for the
proportion that break within a year?
–
Proportion is p = 8/500 = 0.016
–
Standard error is  p1−p/n= 0.016×0.984/500
= 0.000031488=0.0056
27
●
Confidence Interval for
Population Proportion
What is the 90% confidence interval for the
proportion that break within a year?
–
Proportion is p = 8/500 = 0.016
–
Std Error:  p1−p/n= 0.016×0.984/500=0.0056
–
90% confidence interval? Same method as
before:
Lower Bound: 0.016 – 1.645(0.0056) = .00676
Upper Bound: 0.016 + 1.645(0.0056) = .0252
1.645 x 0.0056
0.000
0.005
0.010
0.015
1.645 x 0.0056
0.020
0.025
0.030
28
●
●
Confidence Interval for
Population Proportion
A sample of 1000 likely voters finds that
560 support a campaign finance reform
referendum
What is a 99 percent confidence
interval for the percentage of voters
supporting the referendum?
29
●
●
Confidence Interval for
Population Proportion
A sample of 1000 likely voters finds that 560
support a campaign finance reform referendum
What is a 99 percent confidence interval for the
percentage of voters supporting the referendum?
–
Sample proportion: 560/1000 = 0.56
Standard error:
 p1−p/ n= 0.56×1−0.56/1000
=  0.0002464=0.0157
Confidence Interval for
Population Proportion
30
●
What is a 99 percent confidence interval
for the percentage of voters supporting the
referendum?
–
Sample proportion: 560/1000 = 0.56
Standard error:  0.56×0.44/1000=0.0157
–
99% Confidence Interval: 0.56 ± 2.576(.0157)
= 0.5195 to 0.6005
2.58 x 0.157
0.50
0.52
0.54
2.58 x 0.157
0.56
0.58
0.60
0.62
Related documents