Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Business Statistics: A First Course 5th Edition Chapter 7 Sampling and Sampling Distributions Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Learning Objectives In this chapter, you learn:     To distinguish between different sampling methods 了解 各种抽样方法 The concept of the sampling distribution 理解抽样分布 的概念 To compute probabilities related to the sample mean and the sample proportion 计算样本均值和样本比例有 关的分布概率 The importance of the Central Limit Theorem 中心极限 定理的运用 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-2 Why Sample?  Selecting a sample is less time-consuming than selecting every item in the population (census). 抽样调 查相比全面普查更节省大量时间  Selecting a sample is less costly than selecting every item in the population.抽样调查相比全面普查更节省成本  An analysis of a sample is less cumbersome and more practical than an analysis of the entire population. 实际 中,相较于分析总体全部,分析样本更加易操作 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-3 A Sampling Process Begins With A Sampling Frame     The sampling frame is a listing of items that make up the population 抽样框是构成总体的所有抽样单元的列表 Frames are data sources such as population lists, directories, or maps 抽样框的可能形式有总体名单、手 册、地图等 Inaccurate or biased results can result if a frame excludes certain portions of the population 如果抽样框 没有覆盖总体的某个部分,则抽样所得样本可能导致有 偏差的结果 Using different frames to generate data can lead to dissimilar conclusions 通过不同的抽样框所的样本可能 带来不一样的推断结论 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-4 Types of Samples Samples 抽样 Non-Probability Samples非概率抽样 Judgment 判断抽样 Convenience 便利抽样 Probability Samples概 率抽样 Simple Random 简单随机抽样 Stratified 分层抽样 Cluster 整群抽 样 Systematic 系统抽样 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-5 Types of Samples: Nonprobability Sample  In a nonprobability sample, items included are chosen without regard to their probability of occurrence. 非概率 抽样中,个体样本不是按照一定概率入样,而是由抽样 者主观抽出或者入样个体志愿进入样本  In convenience sampling, items are selected based only on the fact that they are easy, inexpensive, or convenient to sample. 便利抽样是调查者根据自己的 方便、自行确定入样个体  In a judgment sample, you get the opinions of preselected experts in the subject matter. 根据调查者或 者事先选定专家的主观意见抽取样本 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-6 Types of Samples: Probability Sample  In a probability sample, items in the sample are chosen on the basis of known probabilities. 概率抽 样中,每个入样个体都被指定了已知的入样可能概率 Probability Samples概率抽样 Simple Random 简单随机抽样 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Systematic Stratified Cluster 系统抽样 分层抽样 整群抽样 Chap 7-7 Probability Sample: Simple Random Sample  Every individual or item from the frame has an equal chance of being selected 抽样框下,每个抽样单元或者 个体都等可能的被抽取  Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected individual isn’t returned to the frame).抽样过程可以是放回的或者是无放回的  Samples obtained from table of random numbers or computer random number generators.样本可以通过随 机数表或者计算机产生随机数等方法实现 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-8 Selecting a Simple Random Sample Using A Random Number Table Sampling Frame For Population With 850 Items Item Name Item # Bev R. Ulan X. . . . . Joann P. Paul F. 001 002 . . . . 849 850 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Portion Of A Random Number Table 49280 88924 35779 00283 81163 07275 11100 02340 12860 74697 96644 89439 09893 23997 20048 49420 88872 08401 The First 5 Items in a simple random sample Item # 492 Item # 808 Item # 892 -- does not exist so ignore Item # 435 Item # 779 Item # 002 Chap 7-9 Probability Sample: Systematic Sample  Decide on sample size: n 先决定样本量n  Divide frame of N individuals into groups of k individuals: k=N/n 将总体个数N分到n组,每组k个个体  Randomly select one individual from the 1st group 随机从第一组的k个个体中选择一个入样个体  Select every kth individual thereafter 然后每隔k个抽取一个入样,直到抽取到n个入样 N = 40 First Group n=4 k = 10 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-10 Probability Sample: Stratified Sample  Divide population into two or more subgroups (called strata) according to some common characteristic 根据某特征,将总体中的个体不重不漏的分到2 个及以上的层(子总体)中  A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes 在每一层中抽取一个简单随机样本   Samples from subgroups are combined into one 总样本由所有层总样本合成 This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines. 这在选举调查中常用、比如按种族、社 会经济状况等分层 Population Divided into 4 strata Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-11 Probability Sample Cluster Sample   Population is divided into several “clusters,” each representative of the population 将总体按照某种原则分成几个群,每群可以看做总体的代表 A simple random sample of clusters is selected 对群进行简单随机抽样  All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique 一般来说对抽到群的每 个个体进行调查  A common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled. 经常应用于选举后的民意 调查,此时对某些选区进行随机选取和抽样 Population divided into 16 clusters. Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Randomly selected clusters for sample Chap 7-12 Probability Sample: Comparing Sampling Methods    Simple random sample and Systematic sample  Simple to use 易于运用  May not be a good representation of the population’s underlying characteristics 有时候总体的代表性不足 Stratified sample  Ensures representation of individuals across the entire population 能保证总体各个部分在样本中都有代表 Cluster sample  More cost effective 调查成本较易控制  Less efficient (need larger sample to acquire the same level of precision) 抽样的有效性较低,需要更大的样本量来达到相同的精度 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-13 Evaluating Survey Worthiness       What is the purpose of the survey? 抽样调查目的 Is the survey based on a probability sample? Coverage error – appropriate frame? 覆盖误差-不适合 的抽样框会带来选择性偏差 Nonresponse error – follow up 无回答误差 Measurement error – good questions elicit good responses 测量误差-需要精心设计问卷 Sampling error – always exists 抽样误差-永远存在、可 以控制 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-14 Types of Survey Errors  Coverage error or selection bias 选择性偏差   Non response error or bias 无回答误差\偏差   People who do not respond may be different from those who do respond 回答和不回答的个体特征是不同的 Sampling error   Exists if some groups are excluded from the frame and have no chance of being selected 当总体的某部分没有包括在抽样框时产生选 择偏差 Variation from sample to sample will always exist 可能的不同样本之 间差异带来的、总是存在 Measurement error  Due to weaknesses in question design, respondent error, and interviewer’s effects on the respondent (“Hawthorne effect”) 由于问卷设计、回答误差,或者霍索恩效应导致的误差 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-15 Types of Survey Errors (continued)  Coverage error Excluded from frame  Non response error Follow up on nonresponses  Sampling error Random differences from sample to sample  Measurement error Bad or leading question Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-16 Sampling Distributions  A sampling distribution is a distribution of all of the possible values of a sample statistic for a given size sample selected from a population. 抽样分布指再确定的抽样方案和样本来那个条件下,样本统计量的所有 可能取值的分布  For example, suppose you sample 50 students from your college regarding their mean GPA. If you obtained many different samples of 50, you will compute a different mean for each sample. We are interested in the distribution of all potential mean GPA we might calculate for any given sample of 50 students. 假定在全校学生中抽取 一个样本量为50的样本,计算他们的平均GPA值。这种样本可能或有 很多种可能(N个里面抽取50个的所有可能),对所有可能样本计算 GPA均值,就可以得到该校学生GPA均值的分布 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-17 Developing a Sampling Distribution  Assume there is a population …  Population size N=4  Random variable, X, is age of individuals  Values of X: 18, 20, 22, 24 (years) A B C D 考虑一个包含4个人的总体,用变量 X表示年龄,其值分别为18、20、22、 24岁 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-18 Developing a Sampling Distribution (continued) Summary Measures for the Population Distribution:该总体的描述度量: X  μ P(x) i .3 N 18  20  22  24   21 4 σ 2 (X  μ)  i N  2.236 .2 .1 0 18 20 22 24 A B C D x Uniform Distribution 均匀分布 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-19 Developing a Sampling Distribution Now consider all possible samples of size n=2 考虑一个放回的简单随机抽样,样本量n=2,考察样本均 值,则所有可能样本如下(4*4=16): 1st Obs (continued) 16 Sample Means 2nd Observation 18 20 22 24 18 18,18 18,20 18,22 18,24 1st 2nd Observation Obs 18 20 22 24 20 20,18 20,20 20,22 20,24 18 18 19 20 21 22 22,18 22,20 22,22 22,24 24 24,18 24,20 24,22 24,24 16 possible samples (sampling with replacement) 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24 所有可能的16个样本 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-20 Developing a Sampling Distribution (continued) Sampling Distribution of All Sample Means 样本均值的抽样分布 Sample Means Distribution 16 Sample Means 1st 2nd Observation Obs 18 20 22 24 18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. _ P(X) .3 .2 .1 0 18 19 20 21 22 23 (no longer uniform 不再是均匀分布) 24 _ X Chap 7-21 Developing a Sampling Distribution (continued) Summary Measures of this Sampling Distribution:抽样分布的描述度量 μX X  i 18  19  19    24    21 σX   N 16 2 ( X  μ ) i  X N (18 - 21)2  (19 - 21)2    (24 - 21)2  1.58 16 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-22 Comparing the Population Distribution to the Sample Means Distribution Population N=4 μ  21 σ  2.236 Sample Means Distribution n=2 μX  21 σ X  1.58 _ P(X) .3 P(X) .3 .2 .2 .1 .1 0 18 20 22 24 A B C D Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. X 0 18 19 20 21 22 23 24 _ X Chap 7-23 Sample Mean Sampling Distribution: Standard Error of the Mean  Different samples of the same size from the same population will yield different sample means 显然同一总体、不同样本之间的均值不同  A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean:这种不同样本之间均值变化程度的度量 称为均值的标准误: (This assumes that sampling is with replacement or sampling is without replacement from an infinite population,假定总 体N很大,或者无穷总体) σ σX  n  Note that the standard error of the mean decreases as the sample size increases 显然均值的标准误随着样本量的增加而减少 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-24 Sample Mean Sampling Distribution: If the Population is Normal  If a population is normally distributed with mean μ and standard deviation σ, the sampling distribution of X is also normally distributed with 如果总体本身服从均值为μ, 标准差为σ的正态分布,则均值 的抽样分布也是正态的, X 其均值和标准误如下 μX  μ Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. and σ σX  n Chap 7-25 Z-value for Sampling Distribution of the Mean  Z-value for the sampling distribution of X : X的Z值 Z where: ( X  μX ) σX ( X  μ)  σ n X = sample mean 样本均值 μ = population mean 总体均值 σ = population standard deviation 总体标准差 n = sample size 样本量 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-26 Sampling Distribution Properties  Normal Population Distribution μx  μ X的期望等于总体均值 x (i.e. is unbiased X 是 μ 的无偏估计) Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. μ x μx x Normal Sampling Distribution (has the same mean) Chap 7-27 Sampling Distribution Properties (continued) As n increases, Larger sample size σ x decreases Smaller sample size μ Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. x Chap 7-28 Determining An Interval Including A Fixed Proportion of the Sample Means Find a symmetrically distributed interval around µ that will include 95% of the sample means when µ = 368, σ = 15, and n = 25. 在µ = 368, σ = 15, and n = 25时,计算一个关 于总体均值µ对称的区间,使得该区间能够覆盖95%的样本 均值    Since the interval contains 95% of the sample means 5% of the sample means will be outside the interval 95%被覆盖,意味着5%可能样本均值 不在该区间范围 Since the interval is symmetric 2.5% will be above the upper limit and 2.5% will be below the lower limit.考虑到对称性,各有2.5%的样本均值 落在区间两端外 From the standardized normal table, the Z score with 2.5% (0.0250) below it is -1.96 and the Z score with 2.5% (0.0250) above it is 1.96.从 标准正态表上可以查到2.5%对应的Z值 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-29 Determining An Interval Including A Fixed Proportion of the Sample Means (continued)  Calculating the lower limit of the interval 计算区间下限 σ 15 XL  μ  Z  368  (1.96)  362.12 n 25  Calculating the upper limit of the interval 计算区间上限 XU  σ 15  μZ  368  (1.96)  373.88 n 25 95% of all sample means of sample size 25 are between 362.12 and 373.88 则95%的可能样本均值落在区间[362.12,373.88]。 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-30 Sample Mean Sampling Distribution: If the Population is not Normal  We can apply the Central Limit Theorem: 当总体分布不是正态分布时,中心极限定理仍然可以保证 样本均值抽样分布的正态性(当样本量足够大时)   Even if the population is not normal, …sample means from the population will be approximately normal as long as the sample size is large enough. Properties of the sampling distribution:抽样分布的性质不变 μx  μ Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. and σ σx  n Chap 7-31 Central Limit Theorem As the sample size gets large enough… 当样本量足 够大时 n↑ the sampling distribution becomes almost normal regardless of shape of population 不管总体分布是 否正态,样本均 值的抽样分布都 近似正态 x Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-32 Sample Mean Sampling Distribution: If the Population is not Normal (continued) Population Distribution Sampling distribution properties: Central Tendency μx  μ σ σx  n Variation Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. μ x Sampling Distribution (becomes normal as n increases) Larger sample size Smaller sample size μx x Chap 7-33 How Large is Large Enough?  For most distributions, n > 30 will give a sampling distribution that is nearly normal 对大多数分布,样本 量n>30时能保证抽样分布的渐进正态性  For fairly symmetric distributions, n > 15 will usually give a sampling distribution is almost normal 如果总 体分布是对称的,则样本量超过15就可以了  For normal population distributions, the sampling distribution of the mean is always normally distributed 而对于正态分布的总体,则抽样分布始终是正态的 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-34 Example   Suppose a population has mean μ = 8 and standard deviation σ = 3. Suppose a random sample of size n = 36 is selected. 假定一个总体 的均值为8,标准差为3,从中抽取一个样本量为 36的随机样本 What is the probability that the sample mean is between 7.8 and 8.2? 请计算样本均值取值在7.8 和8.2之间的概率 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-35 Example (continued) Solution:  Even if the population is not normally distributed, the central limit theorem can be used (n > 30)  … so the sampling distribution of  … with mean  μx = x is approximately normal 8 σ 3   0.5 …and standard deviation σ x  n 36 尽管总体不确定是正态分布,但是样本量为36>30满足中心极限地理 使用条件,从而样本均值的抽样分布近似服从均值为8,标准误为 0.5的正态分布 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-36 Example (continued) Solution (continued):    7.8 - 8 X -μ 8.2 - 8  P(7.8  X  8.2)  P    3 σ 3   36 n 36    P(-0.4  Z  0.4)  0.3108 Population Distribution ??? ? ?? ? ? ? ? ? μ8 Sampling Distribution Standard Normal Distribution Sample .1554 +.1554 Standardize ? X Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. 7.8 μX  8 8.2 x -0.4 μz  0 0.4 Z Chap 7-37 Population Proportions总体比例 π = the proportion of the population having some characteristic π表示总体中具有某种特征的个体的比例  p Sample proportion ( p ) provides an estimate of π:样本比例p作为它 的 估计 X number of items in the sample having the characteri stic of interest  n sample size  0≤ p≤1  p is approximately distributed as a normal distribution when n is large (assuming sampling with replacement from a finite population or without replacement from an infinite population) 同样,样本比例怕的 抽样分布近似正态分布 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-38 Sampling Distribution of p  Approximated by a normal distribution if:  Sampling Distribution .3 .2 .1 0 nπ  5 and 0 n(1  π )  5 where P( ps) μp  π and .2 .4 .6 8 1 p π(1 π ) σp  n (where π = population proportion) Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-39 Z-Value for Proportions Standardize p to a Z value with the formula: p  Z  σp Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. p   (1  ) n Chap 7-40 Example   If the true proportion of voters who support Proposition A is π = 0.4, what is the probability that a sample of size 200 yields a sample proportion between 0.40 and 0.45? 假定某个选民总体中支持议题A的比例为0.4,则在 次总体中抽取一个样本量为200的样本,则样本比例取值 在0.4和0.45之间的概率是多少? i.e.: if π = 0.4 and n = 200, what is P(0.40 ≤ p ≤ 0.45) ? Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-41 Example (continued)  if π = 0.4 and n = 200, what is P(0.40 ≤ p ≤ 0.45) ? Find σ p : σ p   (1  ) n 0.4(1  0.4)   0.03464 200 0.45  0.40   0.40  0.40 Convert to P(0.40  p  0.45)  P Z  standardized 0.03464   0.03464 normal:  P(0  Z  1.44) Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-42 Example (continued)  if π = 0.4 and n = 200, what is P(0.40 ≤ p ≤ 0.45) ? Use standardized normal table: P(0 ≤ Z ≤ 1.44) = 0.4251 Standardized Normal Distribution Sampling Distribution 0.4251 Standardize 0.40 0.45 Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. p 0 1.44 Z Chap 7-43 Chapter Summary        Discussed probability and nonprobability samples Described four common probability samples Examined survey worthiness and types of survey errors Introduced sampling distributions Described the sampling distribution of the mean  For normal populations  Using the Central Limit Theorem Described the sampling distribution of a proportion Calculated probabilities using sampling distributions Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 7-44