Download CHAPTER 7 SAMPLING DISTRIBUTION

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
CHAPTER 7
SAMPLING DISTRIBUTION
7.2 Sampling Plans and Experimental Designs
The way a sample is selected is called the sampling plan or experimental design. Simple
random sampling is commonly used sampling plan in which every sample of size n has the
same chance of being selected. Four most commonly used sampling plans are given as follows.
De…nition 1. If a sample of n elements is selected from a population of N elements
using a plan in which each of the possible samples has the equal chance of selection, then
the sampling is said to be random and the result sample is a simple random sample.
Example 1. Suppose we want to select a sample of size n = 2 from a population
containing N = 4 objects (say, A, B, C, and D). There are six distinct samples that could
be selected, as listed in the following table.
Sample
1
2
3
4
5
6
Observations in Sample
A; B
A; C
A; D
B; C
B; D
C; D
If each of these six samples has the equal chance of being selected, given by 1=6, then the
resulting sample is called a simple random sample, or just a random sample. In general, we
have the above de…nition.
The selection of a simple random sample can be done by using random numbers - digits generated so that the values 0 to 9 occur randomly and with equal frequency. Another
method is to let computer generates random numbers for sampling.
De…nition 2. When the population consists of two or more subpopulations, called strata,
a sampling plan that ensures that a simple random sample is selected from each subpopulation is called a strati…ed random sample.
Example 2. Suppose a public opinion poll designed to estimate the proportion of voters
who favor spending more tax revenue on an improved ambulance service is to be conducted
in a certain county. The county contains two cities and a rural area. The population elements of interest for the poll are all men and women of voting age who reside in the county.
A strati…ed random sample of adults residing in the county can be obtained by selecting a
1
simple random sample of adults from each city and another simple random sample of adults
from the rural area. In this case, the two cities and the rural area represents three strata
from which simple random sample are selected.
The principal reasons for using strati…ed random sampling rather than simple random
sampling are as follows:
1. Strati…cation may produce a smaller sampling error than would be produced by a
simple random sample of the same size. This result is particularly true if measurements
within strata are homogeneous.
2. The cost per observation in the survey may be reduced by strati…cation of the population elements into convenient grouping.
De…nition 3. When the available sampling units are groups of elements, called clusters,
a cluster sample is a simple random sample of clusters from the available clusters in the
population.
Example 3. To estimate the average income per household in a large city, how should
they choose the sample? If they use simple random sampling, they will need a frame listing all households (elements) in the city, and this frame may be very costly or impossible
to obtain. They cannot avoid this problem by using strati…ed random sampling because a
frame is still required for each stratum in the population. Rather than draw a simple random sample of elements, they could divide the city into regions such as blocks (or clusters
of elements) and select a simple random sample of blocks from the population. This task
is easily accomplished by using a frame that lists all city blocks. Then the income of every
household within each sampled block could be measured.
De…nition 4. A 1-in-k systematic random sample involves the random selection of one
of the …rst k elements in an ordered population, and then the systematic selection of every
kth element thereafter.
A systematic sample is generally spread more uniformly over the entire population and
thus may provide more information about the population than an equivalent amount of data
contained in a simple random sample.
Example 4. Suppose we wish to select a 1-in-5 systematic sample of travel vouchers
from a stack of N = 1000 (that is, sample n = 200 vouchers) to determine the proportion
of vouchers …led correctly. A voucher is drawn at random from the …rst …ve vouchers (for
instance, number 2), and every …fth voucher thereafter is included in the sample. Suppose
that most of the …rst 500 vouchers have been correctly …led, but because of a change in clerk,
the second 500 have all been incorrectly …led. Simple random sampling could accidentally
select a large number (perhaps all) of the 200 vouchers from either the …rst or the second
500 vouchers and hence yield a very poor estimate of the true proportion of correct …ling.
In contrast, the systematic sampling would select an equal number of vouchers from each of
the two groups and would give a very accurate estimate of the fraction of vouchers correctly
…led.
2
7.3 Statistics and Sampling Distribution
When we select a random sample from a population, the numerical descriptive measures,
such as mean, standard deviation, and so on, calculated from the sample is referred to as
statistics. These statistics vary or change for each di¤erent random sample we select; that
is, they are random variables. The probability distributions for statistics are called sampling
distributions because, in repeated sampling, they provide this information:
F What value of the statistic can occur.
F How often each value occur.
De…nition 5. The sampling distribution of a statistic is the probability distribution for
the possible values of the statistic that results when random samples of size n are repeated
drawn from the population.
There are three ways of …nding the sampling distribution of a statistic:
1. Derive the distribution mathematically using the laws of probability.
2. Use simulation to approximate the distribution. That is, draw a large number of
samples of size n, calculating the value of the statistic for each sample, and tabulate the
results in a relative frequency histogram. When the number of samples is large, the histogram
will be very close to the theoretical sampling distribution.
3. Use statistical theorems to derive exact or approximate sample distribution.
Example 5. Suppose a population consists of N = 5 numbers: 3, 6, 9, 12, 15. If a
random sample of size n = 3 is selected without replacement, …nd the sample distribution
for
(a) the sample mean x,
(b) the sample median m.
Solution. All possible random samples of size n = 3 and their corresponding means and
medians are given below.
Sample
1
2
3
4
5
6
7
8
9
10
Observations in Sample
3; 6; 9
3; 6; 12
3; 6; 15
3; 9; 12
3; 9; 15
3; 12; 15
6; 9; 12
6; 9; 15
6; 12; 15
9; 12; 15
Sample Mean
6
7
8
8
9
10
9
10
11
12
(a) The sample distribution for the sample mean x is given by
3
Sample Median
6
6
6
9
9
12
9
9
12
12
1
= 0:1
10
1
P fx = 7g =
= 0:1
10
2
P fx = 8g =
= 0:2
10
2
P fx = 9g =
= 0:2
10
2
P fx = 10g =
= 0:2
10
1
= 0:1
P fx = 11g =
10
1
= 0:1:
P fx = 12g =
10
P fx = 6g =
That is,
x
6
7
8
9
10 11 12
p (x) 0:1 0:1 0:2 0:2 0:2 0:1 0:1
(b) The sample distribution for the sample median m is given by
P fm = 6g =
3
= 0:3
10
required
4
= 0:4
10
3
P fm = 12g =
= 0:3:
10
P fm = 9g =
That is,
m
6
9
12
p (m) 0:3 0:4 0:3
Note. It is usually very di¢ cult to derive sampling distributions by the method described
in the preceding example. When this method is no longer feasible, we may have to use one
of these methods:
F Use a simulation to approximate the sampling distribution empirically.
F Rely on statistical theorems and theoretical results.
7.4 The Central Limit Theorem
The Central Limit Theorem states that, under rather general conditions, sums and means
of random samples of measurements drawn from a population tend to have an approximately
4
normal distribution
Consider an experiment of tossing a balanced die n times. Let x denote the mean of
the numbers on the n upper faces. If we use computer software to generate and depict the
histograms of the sampling distribution of x for n = 2, n = 3, n = 4, and so on, we will
amazingly …nd that the shape of these histograms looks closer and closer like the standard
normal curve as n becomes larger and larger.
Theorem 1 (Central Limit Theorem). If random samples of n observations are drawn
from a nonnormal population with …nite mean and standard deviation , then, when n is
large, the sampling distribution of the sample
p mean x is approximately normally distributed,
with mean and standard deviation = n. The approximation becomes more accurate as
n becomes large.
Example 6. Achievement test scores of all high school seniors in a certain state have
mean = 60 and variance 2 = 64. A random sample of n = 100 students from a large high
school had a mean score of 58. Is there evidence to suggest that this high school is inferior?
Solution. Let x denote the mean of a random sample of n = 100 scores from a population
with = 60 and 2 = 64. We wish to calculate the probability that the sample mean x is
at most 58, namely, P fx 58g. By the Central Limit Theorem, it follows that
P fx
58g
P fz
2:5g = 0:0062
where the standardized value of the mean score 58 is calculated as
58 60
p
=
8= 100
2:5:
Since this probability is exceedingly small, it is unlikely that any peer high school will produce
the mean score lower than 58. This evidence suggests that the average score for this high
school is inferior.
7.5 The Sampling distribution of the Sample Mean
Theorem 1 (The Sampling distribution of the Sample Mean x)
F If a random sample of n measurements is selected from a population with mean
standard deviation , p
the sampling distribution of the sample mean x will have mean
standard deviation = n.
and
and
F If the population has a normal distribution, the sampling distribution of the sample
p
mean x will be exactly normally distributed with mean and standard deviation = n.
F If the population distribution is nonnormal, the sampling distribution of the sample
mean
p x will be approximately normally distributed, with mean and standard deviation
= n, for large samples (by the Central Limit Theorem).
De…nition 6. The standard deviation of a statistic used as an estimator of a population
parameter is also called the standard error of the estimator (abbreviated SE) because it
5
refers to the precision of the estimator.
p
Therefore, the standard deviation of x - given by = n - is referred to as the standard
error of the mean, abbreviated as SE (x) or just SE.
Example 7. The duration of Alzheimer’s disease from the onset of symptoms until death
ranges from 3 to 20 years; the average is 8 years with a standard deviation of 4 years. The
administrator of a large medical center randomly selects the medical records of 30 deceased
Alzheimer’s patients from the medical center’s database and records the average duration
Find the approximate probability that the average
(a) is less than 7 years,
(b) exceeds 7 years,
(c) lies within 1 year of the population mean
= 8.
Solution. The standard error is
4
p = p = 0:73:
n
30
(a) To …nd the probability that the average is less than 7 years, we need to calculate the
standardized value of 7:
7 8
= 1:37:
0:73
Then the desired probability is
P fx < 7g
P fz <
1:37g = 0:0853:
(b) The probability that the average exceeds 7 years is
P fx > 7g
P fz >
1:37g = 1
0:0853 = 0:9147:
(c) To …nd the probability that the average lies within 1 year of the population mean
= 8, we need to calculate the standardized values of 7 and 9:
7 8
=
0:73
1:37
and
9 8
= 1:37:
0:73
Then the required probability is
P f7 < x <9g
P f 1:37 < z < 1:37g
= P fz < 1:37g P fz <
= 0:9147 0:0853
= 0:8294:
1:37g
Example 8. To avoid di¢ culties with the Federal Trade Commission or state and local
consumer protection agencies, a beverage bottler must make reasonably certain that 12-ounce
6
bottles actually contain 12 ounces of beverage. To determine whether a bottling machine is
working satisfactorily, one bottler randomly samples 30 bottles per hour and measures the
amount of beverage in each bottle. The mean x of the 30 …ll measurements is used to decide
whether to readjust the amount of beverage delivered per bottle by the …lling machine. If
records show that the amount of …ll per bottle is normally distributed, with a standard
deviation of 0:3 ounces, and if the bottling machine is set to produce a mean …ll per bottle
of 12 ounces, what is the approximate probability that the sample mean x of the 30 test
bottles is less than 11:99 ounces?
Solution. The standard error is
0:3
p = p = 0:055:
n
30
To …nd the probability that the sample mean of the 10 test bottles is less than 12 ounces,
we need to calculate the standardized value of 11:9:
11:9 12
=
0:055
1:82:
The required probability is then
P fx <11:9g = P fz <
1:82g = 0:0344:
Since this probability is very small, the company should not have di¢ culties with the Federal
Trade Commission or state and local consumer protection agencies.
Example 9. An electronic …rm manufacturers light bulbs that have a length of life with
mean 800 hours and a standard deviation of 80 hours. Find the probability that a random
sample of 64 bulbs will have an average life of greater than 775 hour.
Solution. The standard error is
80
p = p = 10:
n
64
To …nd the probability that the sample mean of the 64 bulbs is greater than 775 hours, we
need to calculate the standardized value of 775:
775
800
10
=
2:5:
The required probability is then
P fx > 775g
P fz >
HOMEWORK: pp:273
2:5g = 1
P fz <
2:5g = 1
274
7:19; 7:24; 7:29; 7:30; 7:31; 7:33
7
0:0062 = 0:9938:
7.6 The Sampling Distribution of the Sample Proportion
Let x be a binomial random variable with n trials and probability p of success. Here
the parameter p can also be referred to as the population proportion of success. Since x
represents the number of successes in n trials, the sample proportion of success
pb =
x
n
will be used to estimate of the population proportion p.
p
The binomial random variable x has mean = np and standard deviation = npq.
Since pb is simply the value of x, expressed as a proportion pb = nx , the sampling distribution
of pb is identical to the probability distribution of x, except that it has a new scale along the
horizontal axis. Because of this change of scale, the mean and standard deviation of pb are
also rescaled, so that the mean of the sampling distribution is p and its standard error is
r
pq
SE (b
p) =
where q = 1 p.
n
Just as we can approximate the probability distribution of the binomial random variable
x with a normal distribution when the sample size n is large, we can do the same with the
sampling distribution of pb.
Theorem 2 (Properties of the Sampling Distribution of the Sample Proportion pb). If a
random sample of n observations is drawn from a binomial population with parameter p,
then the sampling distribution of the sample proportion
pb =
will have a mean p and standard deviation
r
pq
SE (b
p) =
n
x
n
where q = 1
p.
When the sample size n is large, the sampling distribution of pb can be approximated by a
normal distribution. The approximation will be adequate if np > 5 and nq > 5.
Example 10. In a survey, 500 mothers and fathers were asked about the importance
of sports for boys and girls. Of the parents interviewed, 60% agree that the genders are
equal and should have equal opportunities to participate in sports. Describe the sampling
distribution of the sample proportion pb of parents who agree that the genders are equal and
should have equal opportunities.
Solution. Let p denote the population proportion of all parents in the United States
who agree that the genders are equal and should have equal opportunities. The sampling
8
distribution of pb can be approximated by a normal distribution, with mean equal to p and
standard error
r
pq
SE (b
p) =
where q = 1 p.
n
It should be noted that the sampling distribution of pb is centered over its mean p. Even
though we do not know the exact value of p (the sample proportion pb = 0:60 may be larger or
smaller than p), an approximate value for the standard deviation of the sampling distribution
can be found using the sample proportion pb = 0:60 to approximate the unknown value of p.
Thus,
r
r
r
(0:60) (0:40)
pq
pbqb
= 0:022:
=
SE (b
p) =
500
n
n
Now the probability the pb will fall within 2SE (b
p) = 0:044 is given by
P fjb
p
pb p
0:044
<
P fjzj < 2g
SE (b
p)
0:022
= P f 2 < z < 2g = P fz < 2g P fz <
= 0:9772 0:0228
= 0:9544:
pj < 0:044g = P
2g
Therefore, approximately 95% of the time pb will fall within 2SE (b
p) = 0:044 of the (unknown) value of p.
Example 11. Refer to Example 10. Suppose the proportion p of parents in the population is actually equal to 0:55. What is the probability of observing a sample proportion
larger than or equal to the observed value pb = 0:60?
Solution. Since n = 500 and pb = 0:60, we calculate
r
r
pq
(0:55) (0:45)
=
SE (b
p) =
= 0:0222:
n
500
The required probability is
P fb
p
0:60g
P fz
2:25g = 1
P fz
2:25g = 1
0:9878 = 0:0122;
where the standardized value of 0:60 is
0:60 0:55
= 2:25.
0:0222
That is, if we were to select a random sample of n = 500 observations from a population
with proportion p equal to 0:55, the probability that the sample proportion pb would be larger
than or equal to 0:60 is only 0:122.
HOMEWORK: pp:279
281
7:37; 7:41; 7:43; 7:45; 7:47
9