Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 7 SAMPLING DISTRIBUTION 7.2 Sampling Plans and Experimental Designs The way a sample is selected is called the sampling plan or experimental design. Simple random sampling is commonly used sampling plan in which every sample of size n has the same chance of being selected. Four most commonly used sampling plans are given as follows. De…nition 1. If a sample of n elements is selected from a population of N elements using a plan in which each of the possible samples has the equal chance of selection, then the sampling is said to be random and the result sample is a simple random sample. Example 1. Suppose we want to select a sample of size n = 2 from a population containing N = 4 objects (say, A, B, C, and D). There are six distinct samples that could be selected, as listed in the following table. Sample 1 2 3 4 5 6 Observations in Sample A; B A; C A; D B; C B; D C; D If each of these six samples has the equal chance of being selected, given by 1=6, then the resulting sample is called a simple random sample, or just a random sample. In general, we have the above de…nition. The selection of a simple random sample can be done by using random numbers - digits generated so that the values 0 to 9 occur randomly and with equal frequency. Another method is to let computer generates random numbers for sampling. De…nition 2. When the population consists of two or more subpopulations, called strata, a sampling plan that ensures that a simple random sample is selected from each subpopulation is called a strati…ed random sample. Example 2. Suppose a public opinion poll designed to estimate the proportion of voters who favor spending more tax revenue on an improved ambulance service is to be conducted in a certain county. The county contains two cities and a rural area. The population elements of interest for the poll are all men and women of voting age who reside in the county. A strati…ed random sample of adults residing in the county can be obtained by selecting a 1 simple random sample of adults from each city and another simple random sample of adults from the rural area. In this case, the two cities and the rural area represents three strata from which simple random sample are selected. The principal reasons for using strati…ed random sampling rather than simple random sampling are as follows: 1. Strati…cation may produce a smaller sampling error than would be produced by a simple random sample of the same size. This result is particularly true if measurements within strata are homogeneous. 2. The cost per observation in the survey may be reduced by strati…cation of the population elements into convenient grouping. De…nition 3. When the available sampling units are groups of elements, called clusters, a cluster sample is a simple random sample of clusters from the available clusters in the population. Example 3. To estimate the average income per household in a large city, how should they choose the sample? If they use simple random sampling, they will need a frame listing all households (elements) in the city, and this frame may be very costly or impossible to obtain. They cannot avoid this problem by using strati…ed random sampling because a frame is still required for each stratum in the population. Rather than draw a simple random sample of elements, they could divide the city into regions such as blocks (or clusters of elements) and select a simple random sample of blocks from the population. This task is easily accomplished by using a frame that lists all city blocks. Then the income of every household within each sampled block could be measured. De…nition 4. A 1-in-k systematic random sample involves the random selection of one of the …rst k elements in an ordered population, and then the systematic selection of every kth element thereafter. A systematic sample is generally spread more uniformly over the entire population and thus may provide more information about the population than an equivalent amount of data contained in a simple random sample. Example 4. Suppose we wish to select a 1-in-5 systematic sample of travel vouchers from a stack of N = 1000 (that is, sample n = 200 vouchers) to determine the proportion of vouchers …led correctly. A voucher is drawn at random from the …rst …ve vouchers (for instance, number 2), and every …fth voucher thereafter is included in the sample. Suppose that most of the …rst 500 vouchers have been correctly …led, but because of a change in clerk, the second 500 have all been incorrectly …led. Simple random sampling could accidentally select a large number (perhaps all) of the 200 vouchers from either the …rst or the second 500 vouchers and hence yield a very poor estimate of the true proportion of correct …ling. In contrast, the systematic sampling would select an equal number of vouchers from each of the two groups and would give a very accurate estimate of the fraction of vouchers correctly …led. 2 7.3 Statistics and Sampling Distribution When we select a random sample from a population, the numerical descriptive measures, such as mean, standard deviation, and so on, calculated from the sample is referred to as statistics. These statistics vary or change for each di¤erent random sample we select; that is, they are random variables. The probability distributions for statistics are called sampling distributions because, in repeated sampling, they provide this information: F What value of the statistic can occur. F How often each value occur. De…nition 5. The sampling distribution of a statistic is the probability distribution for the possible values of the statistic that results when random samples of size n are repeated drawn from the population. There are three ways of …nding the sampling distribution of a statistic: 1. Derive the distribution mathematically using the laws of probability. 2. Use simulation to approximate the distribution. That is, draw a large number of samples of size n, calculating the value of the statistic for each sample, and tabulate the results in a relative frequency histogram. When the number of samples is large, the histogram will be very close to the theoretical sampling distribution. 3. Use statistical theorems to derive exact or approximate sample distribution. Example 5. Suppose a population consists of N = 5 numbers: 3, 6, 9, 12, 15. If a random sample of size n = 3 is selected without replacement, …nd the sample distribution for (a) the sample mean x, (b) the sample median m. Solution. All possible random samples of size n = 3 and their corresponding means and medians are given below. Sample 1 2 3 4 5 6 7 8 9 10 Observations in Sample 3; 6; 9 3; 6; 12 3; 6; 15 3; 9; 12 3; 9; 15 3; 12; 15 6; 9; 12 6; 9; 15 6; 12; 15 9; 12; 15 Sample Mean 6 7 8 8 9 10 9 10 11 12 (a) The sample distribution for the sample mean x is given by 3 Sample Median 6 6 6 9 9 12 9 9 12 12 1 = 0:1 10 1 P fx = 7g = = 0:1 10 2 P fx = 8g = = 0:2 10 2 P fx = 9g = = 0:2 10 2 P fx = 10g = = 0:2 10 1 = 0:1 P fx = 11g = 10 1 = 0:1: P fx = 12g = 10 P fx = 6g = That is, x 6 7 8 9 10 11 12 p (x) 0:1 0:1 0:2 0:2 0:2 0:1 0:1 (b) The sample distribution for the sample median m is given by P fm = 6g = 3 = 0:3 10 required 4 = 0:4 10 3 P fm = 12g = = 0:3: 10 P fm = 9g = That is, m 6 9 12 p (m) 0:3 0:4 0:3 Note. It is usually very di¢ cult to derive sampling distributions by the method described in the preceding example. When this method is no longer feasible, we may have to use one of these methods: F Use a simulation to approximate the sampling distribution empirically. F Rely on statistical theorems and theoretical results. 7.4 The Central Limit Theorem The Central Limit Theorem states that, under rather general conditions, sums and means of random samples of measurements drawn from a population tend to have an approximately 4 normal distribution Consider an experiment of tossing a balanced die n times. Let x denote the mean of the numbers on the n upper faces. If we use computer software to generate and depict the histograms of the sampling distribution of x for n = 2, n = 3, n = 4, and so on, we will amazingly …nd that the shape of these histograms looks closer and closer like the standard normal curve as n becomes larger and larger. Theorem 1 (Central Limit Theorem). If random samples of n observations are drawn from a nonnormal population with …nite mean and standard deviation , then, when n is large, the sampling distribution of the sample p mean x is approximately normally distributed, with mean and standard deviation = n. The approximation becomes more accurate as n becomes large. Example 6. Achievement test scores of all high school seniors in a certain state have mean = 60 and variance 2 = 64. A random sample of n = 100 students from a large high school had a mean score of 58. Is there evidence to suggest that this high school is inferior? Solution. Let x denote the mean of a random sample of n = 100 scores from a population with = 60 and 2 = 64. We wish to calculate the probability that the sample mean x is at most 58, namely, P fx 58g. By the Central Limit Theorem, it follows that P fx 58g P fz 2:5g = 0:0062 where the standardized value of the mean score 58 is calculated as 58 60 p = 8= 100 2:5: Since this probability is exceedingly small, it is unlikely that any peer high school will produce the mean score lower than 58. This evidence suggests that the average score for this high school is inferior. 7.5 The Sampling distribution of the Sample Mean Theorem 1 (The Sampling distribution of the Sample Mean x) F If a random sample of n measurements is selected from a population with mean standard deviation , p the sampling distribution of the sample mean x will have mean standard deviation = n. and and F If the population has a normal distribution, the sampling distribution of the sample p mean x will be exactly normally distributed with mean and standard deviation = n. F If the population distribution is nonnormal, the sampling distribution of the sample mean p x will be approximately normally distributed, with mean and standard deviation = n, for large samples (by the Central Limit Theorem). De…nition 6. The standard deviation of a statistic used as an estimator of a population parameter is also called the standard error of the estimator (abbreviated SE) because it 5 refers to the precision of the estimator. p Therefore, the standard deviation of x - given by = n - is referred to as the standard error of the mean, abbreviated as SE (x) or just SE. Example 7. The duration of Alzheimer’s disease from the onset of symptoms until death ranges from 3 to 20 years; the average is 8 years with a standard deviation of 4 years. The administrator of a large medical center randomly selects the medical records of 30 deceased Alzheimer’s patients from the medical center’s database and records the average duration Find the approximate probability that the average (a) is less than 7 years, (b) exceeds 7 years, (c) lies within 1 year of the population mean = 8. Solution. The standard error is 4 p = p = 0:73: n 30 (a) To …nd the probability that the average is less than 7 years, we need to calculate the standardized value of 7: 7 8 = 1:37: 0:73 Then the desired probability is P fx < 7g P fz < 1:37g = 0:0853: (b) The probability that the average exceeds 7 years is P fx > 7g P fz > 1:37g = 1 0:0853 = 0:9147: (c) To …nd the probability that the average lies within 1 year of the population mean = 8, we need to calculate the standardized values of 7 and 9: 7 8 = 0:73 1:37 and 9 8 = 1:37: 0:73 Then the required probability is P f7 < x <9g P f 1:37 < z < 1:37g = P fz < 1:37g P fz < = 0:9147 0:0853 = 0:8294: 1:37g Example 8. To avoid di¢ culties with the Federal Trade Commission or state and local consumer protection agencies, a beverage bottler must make reasonably certain that 12-ounce 6 bottles actually contain 12 ounces of beverage. To determine whether a bottling machine is working satisfactorily, one bottler randomly samples 30 bottles per hour and measures the amount of beverage in each bottle. The mean x of the 30 …ll measurements is used to decide whether to readjust the amount of beverage delivered per bottle by the …lling machine. If records show that the amount of …ll per bottle is normally distributed, with a standard deviation of 0:3 ounces, and if the bottling machine is set to produce a mean …ll per bottle of 12 ounces, what is the approximate probability that the sample mean x of the 30 test bottles is less than 11:99 ounces? Solution. The standard error is 0:3 p = p = 0:055: n 30 To …nd the probability that the sample mean of the 10 test bottles is less than 12 ounces, we need to calculate the standardized value of 11:9: 11:9 12 = 0:055 1:82: The required probability is then P fx <11:9g = P fz < 1:82g = 0:0344: Since this probability is very small, the company should not have di¢ culties with the Federal Trade Commission or state and local consumer protection agencies. Example 9. An electronic …rm manufacturers light bulbs that have a length of life with mean 800 hours and a standard deviation of 80 hours. Find the probability that a random sample of 64 bulbs will have an average life of greater than 775 hour. Solution. The standard error is 80 p = p = 10: n 64 To …nd the probability that the sample mean of the 64 bulbs is greater than 775 hours, we need to calculate the standardized value of 775: 775 800 10 = 2:5: The required probability is then P fx > 775g P fz > HOMEWORK: pp:273 2:5g = 1 P fz < 2:5g = 1 274 7:19; 7:24; 7:29; 7:30; 7:31; 7:33 7 0:0062 = 0:9938: 7.6 The Sampling Distribution of the Sample Proportion Let x be a binomial random variable with n trials and probability p of success. Here the parameter p can also be referred to as the population proportion of success. Since x represents the number of successes in n trials, the sample proportion of success pb = x n will be used to estimate of the population proportion p. p The binomial random variable x has mean = np and standard deviation = npq. Since pb is simply the value of x, expressed as a proportion pb = nx , the sampling distribution of pb is identical to the probability distribution of x, except that it has a new scale along the horizontal axis. Because of this change of scale, the mean and standard deviation of pb are also rescaled, so that the mean of the sampling distribution is p and its standard error is r pq SE (b p) = where q = 1 p. n Just as we can approximate the probability distribution of the binomial random variable x with a normal distribution when the sample size n is large, we can do the same with the sampling distribution of pb. Theorem 2 (Properties of the Sampling Distribution of the Sample Proportion pb). If a random sample of n observations is drawn from a binomial population with parameter p, then the sampling distribution of the sample proportion pb = will have a mean p and standard deviation r pq SE (b p) = n x n where q = 1 p. When the sample size n is large, the sampling distribution of pb can be approximated by a normal distribution. The approximation will be adequate if np > 5 and nq > 5. Example 10. In a survey, 500 mothers and fathers were asked about the importance of sports for boys and girls. Of the parents interviewed, 60% agree that the genders are equal and should have equal opportunities to participate in sports. Describe the sampling distribution of the sample proportion pb of parents who agree that the genders are equal and should have equal opportunities. Solution. Let p denote the population proportion of all parents in the United States who agree that the genders are equal and should have equal opportunities. The sampling 8 distribution of pb can be approximated by a normal distribution, with mean equal to p and standard error r pq SE (b p) = where q = 1 p. n It should be noted that the sampling distribution of pb is centered over its mean p. Even though we do not know the exact value of p (the sample proportion pb = 0:60 may be larger or smaller than p), an approximate value for the standard deviation of the sampling distribution can be found using the sample proportion pb = 0:60 to approximate the unknown value of p. Thus, r r r (0:60) (0:40) pq pbqb = 0:022: = SE (b p) = 500 n n Now the probability the pb will fall within 2SE (b p) = 0:044 is given by P fjb p pb p 0:044 < P fjzj < 2g SE (b p) 0:022 = P f 2 < z < 2g = P fz < 2g P fz < = 0:9772 0:0228 = 0:9544: pj < 0:044g = P 2g Therefore, approximately 95% of the time pb will fall within 2SE (b p) = 0:044 of the (unknown) value of p. Example 11. Refer to Example 10. Suppose the proportion p of parents in the population is actually equal to 0:55. What is the probability of observing a sample proportion larger than or equal to the observed value pb = 0:60? Solution. Since n = 500 and pb = 0:60, we calculate r r pq (0:55) (0:45) = SE (b p) = = 0:0222: n 500 The required probability is P fb p 0:60g P fz 2:25g = 1 P fz 2:25g = 1 0:9878 = 0:0122; where the standardized value of 0:60 is 0:60 0:55 = 2:25. 0:0222 That is, if we were to select a random sample of n = 500 observations from a population with proportion p equal to 0:55, the probability that the sample proportion pb would be larger than or equal to 0:60 is only 0:122. HOMEWORK: pp:279 281 7:37; 7:41; 7:43; 7:45; 7:47 9