Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 9: Estimation Using a Single Sample Confidence Intervals Inferential Statistics • Our study of confidence intervals begins our study of inferential statistics • In inferential statistics, our objective is to learn about a population from a sample of data • Use a sample of data to decrease our uncertainty about the population the sample was drawn from • More specifically, we’ll be using samples of data to estimate unknown population parameters like and . Point Estimates • A single number derived from a sample of data (statistic) that represents a plausible value for a population parameter • First, we decide what is the appropriate statistic. We then collect a random sample of data. The computed statistic is our point estimate -- X as a point estimate for More than one choice • Interested in the proportion of American voters who support gay marriages • Obviously the appropriate statistic is sample proportion -- p as an estimate for • Sometimes there’s more than one choice • Sample mean, trimmed mean or median as a point estimate for population mean • How do you choose? Choose the statistic that tends, on average, to be the closest estimate to the true value. Biased and Unbiased Statistic • When there’s more than one choice we want to choose the statistic that is most accurate • Sampling distributions of statistics give us information about how accurate a statistic is for estimating a population parameter • Statistics with sampling distributions that are centered on the parameter we’re trying to estimate are called unbiased • The two unbiased statistics we’ll be studying are sample mean and sample proportion Accuracy of Point Estimates • Even though we might select an unbiased statistic, how accurate is this single number that we calculate? • Remember sampling variability? • Example – samples of 50 from a normal distribution • Using an unbiased statistic with a small standard deviation guarantees no systematic tendency to underestimate or overestimate the parameter and the estimates will be relatively close to the true value Confidence Intervals • How accurate a point estimate is depends on which sample you happen to draw from the population • While the point estimate using an unbiased statistic may be our best single-number best guess – it’s not the only plausible estimate • An alternative to a single number estimate is to provide a range of values or an interval that we feel very confident the true value will fall into • We call this type of estimation confidence intervals Definition of Confidence Interval • An interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the characteristic – parameter – will be captured in the interval Confidence Interval • Confidence Interval = Statistic Critical Value x Statistic Std Dev • • • • Statistic Standard Deviation of Sampling Distribution Critical Value Associated confidence level – How much confidence we have in the method used to construct the CI – Not our confidence in any particular interval Basic Concept of CI • We start with the sampling distribution of the statistic we are using • We will be using sampling distributions that are well approximated by a normal distribution • We take a sample and calculate a point estimate, a statistic (unbiased) from that sample Continuing … • With what we know about normal distributions, we know that about 95% of the statistics calculated from random samples will fall within 2 sd of the mean. • The mean of the sampling distribution is centered on the population parameter • If the statistic is within approx 2 sd of the sampling distribution’s mean 95% of the time, then the interval Statistic Critical Value x Statistic Std Dev will capture the mean of the sampling distribution 95% of the time More … • The width of the interval is adjusted by selecting a different confidence level • Typical confidence levels are 90%, 95% and 99% • The endpoints are determined by multiplying the critical values (which are determined by confidence levels) by the sampling distribution standard deviation (sd of the statistic) Large Sample Confidence Interval for a Population Proportion • Parameter of interest is the population proportion • Statistic used is sample proportion p • Why are large sample CI ?? From last chapter, when sample is large, the statistic is normally distributed • How large is large? n 10 and n1 10 • We know p and 1n p Large Sample Confidence Interval for a Population Proportion • Calculate a sample proportion from a random sample – p number in sample that have characteristic n • Estimate the sample standard deviation – p1 p n standard error • Choose a confidence level – let’s say 95% • Determine the critical value – Use standard normal table – 1.96 • Calculate your confidence interval – Confidence Interval = Statistic Critical Value x Statistic Std Dev Let’s do an example • Pg 453 Problem # 9.14 In summary • The Large Sample Confidence Interval for – p is the sample proportion from a random sample – The sample size, n, is large np 10 and n1 p 10 – The CI is p1 p p z* n – The desired confidence level determines which critical value is used – Note: This method is not appropriate for small samples Choosing the Sample Size • Terminology: Bound • Confidence Interval = Statistic Critical Value x Statistic Std Dev • Consider the statistic an estimate of the parameter • Consider ‘critical value x standard deviation’ the bound on the error of your estimate • In the case of population proportions p z* p1 p n Finding appropriate sample size • Consider that before you do a study, you may be asked to estimate a particular parameter to a certain degree of accuracy • The question now is, how big a sample should I take to get a specific degree of accuracy at a certain confidence level • We use the ‘bound’ to determine sample size z * n 1 B 2 • But the population parameter is unknown so we make a reasonable estimate – or use .5 as a conservative estimate for • Example – pg 454, 9.25 Confidence Interval for Population Mean • We’ll look at these cases: – Population standard deviation is known • n 30 • Small sample but population is approx normal – Population standard deviation is unknown • n 30 • Small sample but population is approx normal Sampling Distribution of the Sample Mean • X • X n • When the population is normal, the sampling distribution is normal regardless of sample size • When the population is not normal, the sampling distribution is normal if the sample size is large (CLT). Confidence Interval for Population Mean Known • X is sample mean from a random sample • Sample size is large or population is approximately normal • Population standard deviation is known • CI is: X z* n Sampling Distribution of the Sample Mean Unknown • • X s X n • When the population is normal, the sampling distribution is normal regardless of sample size • When the population is not normal, the sampling distribution is normal if the sample size is large (CLT) Confidence Interval for Population Mean Unknown • X is sample mean from a random sample • Sample size is large or population is approximately normal • Population standard deviation is known • CI is: * s t : n 1 df X t n Student’s t-Distribution • Recall that a standard normal distribution is a bell-shaped distribution with parameters and • The t-distribution is bell-shaped and centered on 0. • There are many t-distributions differentiated by the degrees of freedom – which is n-1 • Each t-curve is a little more spread out than the zcurve but as n gets larger and larger, the tdistribution approaches the z-curve. Student’s t-Distribution • Recall from our study of sampling distribution the properties of the sampling distribution of X • When the population standard deviation is not known, then X is distributed according to the tdistribution • This distribution will give us critical values a little higher than a normal distribution since we don’t know the value of the population distribution -therefore introducing a little more uncertainty t-Distribution Table • Appendix III in the back of your textbook Choosing the Sample Size • When estimating the population mean using a large sample or a small sample from a normal population, the bound on error estimation, associated with a 95% CL is B 1.96 n • Since population standard deviation is usually unknown we can – Make a best guess – Divide the Range by 4 Degrees of Freedom • The number of independent pieces of information that go into the estimate of the parameter • The number of values in the calculation of a statistic that are free to vary • The number of pieces of independent pieces of info that go into an estimate minus the number of parameters estimated