Download The precision of confidence intervals for means and proportions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
LESSON SEVEN
CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS
An interval estimate for μ of the form ̅̅̅̅̅
𝑥 ± a margin of error would
provide the user with a measure of the uncertainty associated with the
point estimate. One would expect that the formula for “the margin of
error” should take in consideration the factors that determine the
variation in values of the point estimate, such as the sample size n and
the population standard deviation, σ.
Using the Central Limit Theorem (n≥30), we can calculate the interval that
contains the 95% of sample means:
π± 1,96 𝜎𝑥̅ or
π±ε where ε = 1,96𝜎𝑥̅ .
The interval π±ε is described as the interval with a fixed centre, π, and
total width w= w x ε, that contains 95% of all sample means.
In estimation π is unknown. Therefore replace π by a point estimate, 𝑥̅ .
Substitution for π gives the interval
𝑥̅ ± 1,96𝜎𝑥̅ = 𝑥̅ ± 𝜀
The essential difference between the two equations is that in the former
the centre of the interval is fixed at μ, but in the second the centre of the
interval is no longer fixed: the centre moves according to the value of the
new point estimate, 𝑥̅ .
An interval estimate 𝑥̅ ± 𝜀 will contain μ if the sample mean, 𝑥̅ , is one of
the 95% of 𝑥̅ ′𝑠within the interval μ±ε.
An interval estimate will NOT contain μ if the sample mean 𝑥̅ is outside
the interval μ±ε.
Each one of the 95% of sample means that fall within a distance of 1,96𝜎𝑥̅
from μ will result in an interval 𝑥̅ ± 1,96𝜎𝑥̅ that contains the population
mean somewhere within the interval.
Since 95% such interval will contain μ, we can state that we are 95%
confident that the population mean, μ, is in the interval.
The formula for an interval estimate for μ with any level of confidence
may be deduced as a generalization of the 95% confidence interval above.
In general, if we let the area in each tail be /2, then the corresponding Zvalue will be referred as Z/2 ; hence the margin of error ε = Z/2 𝜎𝑥̅ . Then
(1-)x100% is called the “level of confidence” that the interval 𝑥̅ ±
𝑍𝛼/2 𝜎𝑥̅ contains μ somewhere within it.
The (1-a) 100% confidence interval is given by the formula
𝑥̅ ± 𝑍𝛼/2 𝜎𝑥̅
In some applications the population standard deviation σ will be known.
For example the variation in the diameter of discs cut by a certain
machine may have been established over a period of time. In other
application σ will not be known. In such cases (provided n≥30), σ is
estimated by s, the point estimate calculated from the sample data.
Hence, when σ is unknown, the confidence interval for μ is
𝑥̅ ± 𝑍𝛼/2 𝑠𝑥̅
𝑠𝑥̅ =
𝑠
√𝑛
is the sample standard error of mean.
EXAMPLE
An importer if Herbs and Spices claims that the average weight of packets
of Saffron is 20 gms. A randon sample of 36 packets of Saffron is
collected. From the sample, the average weight was calculated as 19,35
gms. The population standard deviation of weights is known to be 1,8.
a) Calculate the 95% confidence interval for the population average
weight, μ.
b) Calculate 99% confidence interval for the population average
weight, μ.
c) Estimate the range for total weight of saffron is 50 with 95%
confidence.
a) 𝜎𝑥̅ =
𝜎
√𝑛
=
1,8
√36
=
1,8
6
= 0,3.
Z0,025 = 1,96
𝑥̅ ± 𝑍0,025 𝜎𝑥̅ = 19,35 ± 1,96 ∗ 0,3 = 19,35 ± 0,5880
From 18,762 to 19,938.
b) Z0,005 = 2,5758
𝑥̅ ± 𝑍0,005 𝜎𝑥̅ = 19,35 ± 2,5758 ∗ 0,3 = 19,35 ± 0,7727
d) Total weight of packets = Number of packets x 𝑥̅ .
The mean weight per packet is between 18,762 and 19,938 gms, with
95% confidence.
Hence the total weight of 50 packets is between 938,1 and 996,9 with
95% confidence.
One-sided confidence intervals
The lower limit, above which we are (1-α)100% confident the
population mean lies:
𝑥̅ − 𝑍𝛼
𝜎
√𝑛
𝑜𝑟 𝑥̅ − 𝑍𝛼
𝑠
√𝑛
(𝑖𝑓 𝜎 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛)
The upper limit, below which we are (1-α)100% confident the
population mean lies:
𝑥̅ + 𝑍𝛼
𝜎
√𝑛
𝑜𝑟 𝑥̅ + 𝑍𝛼
𝑠
√𝑛
(𝑖𝑓 𝜎 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛)
EXAMPLE
A property investor claims that the average rental income per room in
student accommodation is at most £ 5.000 per year. The mean rent paid
by a random sample of 36 students is 5.200, the standard deviation is
735.
a) Calculate a 90% confidence interval for the true mean annual rental
income.
b) Calculate the lower limit for one-sided 95% confidence interval.
a) 𝑠𝑥̅ =
𝑠
√𝑛
=
735
√36
= 122,5
Z0,05 = 1,6449
𝑥̅ ± 𝑍0,05 𝑠𝑥̅ = 5.200± 1,6449*122,5
5.200±201,5
From 4998,5 to 5401,5.
b) You will need Z = 1,6449 when α = 5%
The 95% lower confidence limit is
𝑥̅ − 𝑍0,05 𝑠𝑥̅ = 5.200 − 1,6449 ∗ 122,5 = 4998,5
Confidence intervals for proportions
We saw that the Central Limit Theorem (CLT) stated that sample
means were Normally distributed for n ≥ 30
𝜎
𝑥̅ ∼ 𝑁 (𝜇, )
√𝑛
Then based on the CLT we derived the formula for the confidence
interval for the mean as
𝑥̅ ± 𝑍𝛼/2 𝜎𝑥̅
Similarly, the CLT stated that sample were Normally distributed
𝑝 ∼ 𝑁 (𝜋,
𝜋(1 − 𝜋)
)
√𝑛
for n ≥30.
Based on CLT, the formula for the confidence interval for the
population proportion is given as
𝑝 ± 𝑍𝛼/2 √
𝑝(1 − 𝑝)
𝑛
EXAMPLE
In a poll of 200 voters 88 stated that they will vote for the Green
party candidate. Construct 95% confidence interval. Comment on
the precision of the interval.
p= 88/200 = 0,44.
Z/2 = 1,96
0,44± 1,96√
0,44 (0,56)
200
= 0,44±0,0688
The interval is too wide.
Suppose that p = 0,44 but n= 1.000. The interval will be
0,44± 1,96√
0,44 (0,56)
1.000
= 0,44± 0,0307.
The precision of confidence intervals for means and proportions
It has been already noted that very wide interval estimates are of little
practical use. It has been noted several times that increasing sample size
results in a reduction in the width or precision of a confidence interval. To
calculate the exact sample required to give an interval estimate of a
specified precision, return to the formulae used to calculate confidence
intervals for means and proportions.
The precision of the confidence interval
𝑥̅ ± 𝑍𝛼/2 𝜎𝑥̅
can be written as
𝜀 = 𝑍𝛼/2 𝜎𝑥̅ = 𝑍𝛼/2 √
𝜎
𝑛
So we can solve the equation for n and get
𝑛=(
𝑍𝛼/2 𝜎
𝜀
)
2
.
This is the sample size for (1-α)100% confidence interval for μ, with
precision ±ε.
Similarly, to calculate the sample size that will give a confidence interval
for proportions with a specified precision (±ε), substitute the required
value for ε in the equation
𝑍𝛼 𝑝(1 − 𝑝)
𝑛=(
𝜀 = 𝑍𝛼/2 √
𝑝(1−𝑝)
𝑛
2
)
𝜀
2
is the precision ε, for (1-)100% confidence interval for
proportions.
For maximum precision, substitute p = 0,5.
EXAMPLE
For the data in the example of Saffron pocket calculate the sample size
that will give a 99% confidence interval for the population mean with a
margin error ±0,5 when σ= 1,8.
Z0,005 = 2,5758
2,5758 ∗ 1,8
𝑛=(
)
0,5
2
= 85,986
For the data of the Green party candidate calculate the sample size that
will give a 95% confidence interval with a margin error of ±0,01 for the
population proportion when p is unknown.
Z0,025= 1,96
Since p is unknown we get maximum precision putting p = 0,5.
(1,96)2 0,5 ∗ 0,5
𝑛=
= 9604
0,012
Confidence intervals for differences between means and proportions
While the estimation of a single population mean or proportion is
important, there are situations where we may be more interested in
estimating the difference between two means or proportions. For
example, we may be interested in whether the percentage that intend to
vote for party B is higher that for party A or whether commuting time is
faster by train than by car etc.
It was stated that the distribution for the difference between two normal
independent random variables was also normal, with mean equal to the
difference between the two means and the variance equal to the sum of
variances.
If 𝑋1 ∼ 𝑁(𝜇1, 𝜎12 ) 𝑎𝑛𝑑 𝑋2 ∼ 𝑁(𝜇2, 𝜎22 ) then (𝑋1 − 𝑋2 ) ∼ 𝑁(𝜇1 − 𝜇2 , 𝜎12 +
𝜎12 ).
Similarly the distribution of the sample means is Normally distributed
(n≥30) and then the distribution of differences between every possible
pair of sample means is given by
𝑋̅1 − 𝑋̅2 ∼ 𝑁 (𝜇1 − 𝜇2 ,
𝜎12 𝜎12
+ )
𝑛1 𝑛2
with n1 and n2 ≥30.
Hence, the (1-)100% CI per (μ1 – μ2): (𝑥
̅̅̅1 − ̅̅̅
𝑥2 ) ± 𝑍𝛼/2 √
𝜎12
𝑛1
+√
𝜎22
𝑛2
If the sample sizes are 30 or more and σ1 and σ2 are unknown they may
be estimated by s1 and s2 and the confidence interval is
𝑠12
𝑠22
(𝑥
̅̅̅1 − 𝑥
̅̅̅2 ) ± 𝑍𝛼/2 √ + √
𝑛1
𝑛2
Strictly speaking, the t-percentage point should be used when σ is
unknown, but the Z percentage point is a good approximation for large n.
Difference between proportions
The sample proportions are Normally distributed for n1 and n2 ≥30
according to the Central Limit Theorem. Hence the difference between
sample proportions is also Normally distributed:
𝑝1 − 𝑝2 ~𝑁 (𝜋1 − 𝜋2 ,
𝜋1 (1−𝜋1 )
𝑛1
+
𝜋2 (1−𝜋2 )
𝑛2
).
The point estimate for the difference between two populations is (p1-p2)
the standard error for the difference between sample proportions is
𝑠𝑝1−𝑝2 = √
𝑝1 (1 − 𝑝1 )
𝑝2 (1 − 𝑝2 )
+√
𝑛1
𝑛2
Hence the confidence intervals for the difference between population
proportion (π1-π2) is
𝑝1 − 𝑝2 ± 𝑍𝛼/2 √
𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 )
+
𝑛1
𝑛2
EXAMPLE
Designers of rowing equipment investigate the difference between the
mean weights of male and female rowing teams. Random samples of
male and female rowers are selected: the sample sizes and average
weights and sample standard deviations are given below
Sample size
Sample mean
Sample standard dev.
Male rowers
42
60,5
6,8
Female rowers
30
52,6
4,5
a) Calculate the 95% confidence interval for the difference in means
between male and female rowers.
b) What inference can be drawn from your results about the difference
between population means; the difference between individuals in
each population?
a) The difference between means is (60,5-52,6) = 7,9.
The standard error is √
(6,8)2
42
+√
(4,5)2
30
= 1,3326
Z0,025 = 1,96.
The confidence interval is 7,9± 1,96 * 1,3326 = 7,9±2,6119 = (5,2281;
10,5119).
b) We are 95% confident that the mean weight of male rowers exceeds
the mean weight for female rowers by 5,2881 to 10,5991.
When we are very confident that the mean for male rowers is
greater than the mean for female rowers we cannot assume that
any individual male rower will be heavier than any individual female
rower. This is because the variance for individual values is n times
greater than the variance for means.
EXAMPLE
The table below gives the results for polls taken in two localties.
Sample size
Vote for Green party
Area A
200
88
Area B
160
54
pA = 0,44; pB = 0,3375
(pA- pB) = 0,1025.
The standard error is
√
0,44(1 − 0,44) 0,3375(1 − 0,3375)
+
= 0,0513
200
160
The confidence interval is 0,1025±1,96*0,0513=0,1025±0,0843=
(0,0182;0,1868).
Related documents