Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The t distributions One-sample t confidence interval One-sample t test Matched pairs t procedures Robustness of the t procedures 1 When the sampling distribution of x is close to Normal, we can find probabilities involving x by standardizing: x -m z= s n When we don’t know σ, we can estimate it using the sample standard deviation sx. What happens when we standardize? ?? = x -m sx n This new statistic does not have a Normal distribution! 2 The standard deviation of X can be estimated by s SE X n This quantity is called the standard error of the sample mean. The test statistic will no longer be normally distributed when we use the standard error. ◦ The test statistic will have a new distribution, called the t (or Student’s t) distribution. The t Distributions When we perform inference about a population mean µ using a t distribution, the appropriate degrees of freedom are found by subtracting 1 from the sample size n, making df = n – 1. The t Distributions: Degrees of Freedom Draw an SRS of size n from a large population that has a Normal distribution with mean µ and standard deviation σ. The one-sample t statistic x -m t= sx n has the t distribution with degrees of freedom df = n – 1. 4 The t Distributions When we standardize based on the sample standard deviation sx, our statistic has a new distribution called a t distribution. The t distribution has a shape similar to that of the standard Normal curve in that it is symmetric with a single peak at 0. However, it differs from the Normal curve in that has more area in the tails. However, there is a different t distribution for each sample size, specified by its degrees of freedom (df). 5 The one-sample t interval for a population mean is similar in both reasoning and computational detail to the one-sample z interval for a population mean. The One-Sample t Interval for a Population Mean Choose an SRS of size n from a population having unknown mean µ. A level C sx confidence interval for µ is: x t* n where t* is the critical value for the t(n – 1) distribution. The margin of error is: t * sx n This interval is exact when the population distribution is Normal and approximately correct for large n in other cases. 8 A mutual fund is trying to estimate the return on investment in companies that won quality awards last year. A random sample of 20 such companies is selected, and the return on investment is calculated. The mean of the sample is 14.75 and the standard deviation of the sample is 8.18. Construct a 95% confidence interval for the mean return on investment. From a running production of corn soy blend we take a sample to measure content of vitamin C. The results are: 26 31 23 22 11 22 14 31 Find a 95% confidence interval for the content of vitamin C in this production. Give the margin of error. One-Sample t Test Choose an SRS of size n from a large population that contains an unknown mean µ. To test the hypothesis H0 : µ = µ0, compute the one-sample t statistic: x 0 t sx n Find the P-value by calculating the probability (at degrees of freedom = n – 1) of getting a t statistic this large or larger in the direction specified by the alternative hypothesis Ha. 11 These P-values are exact if the population distribution is normal and are approximately correct for large n in other cases. Use Table D to know the range of the P-values, software gives exact p-values Test whether the vitamin C content from the sample conforms to specifications of 40. Test whether the vitamin C content from the sample is lower than the specifications of 40. Inference about a parameter of a single distribution is less common than comparative inference. In certain circumstances a comparative study makes use of single-sample t procedures. In a matched pairs study, subjects are matched in pairs; the outcomes are compared within each matched pair. A matched pairs analysis is appropriate when there are two measurements or observations per each individual and we examine the change from the first to the second. Typically, the observations are “before” and “after” measures in some sense. ◦ For each individual, subtract the “before” measure from the “after” measure. ◦ Analyze the difference using the one-sample confidence interval or significance-testing t procedures (with H0: µdiff = 0). 20 French teachers attend a course to improve their skills. The teachers take a Modern Language Association’s listening test at the beginning of the course and at it’s end. The maximum possible score on the test is 36. The differences in each participant’s “before” and “after” scores have sample mean 2.5 and sample standard deviation 2.893. Is the improvement significant? Construct a 95% confidence interval for the mean improvement (in the entire population). A confidence interval or significance test is called robust if the confidence level or P-value does not change very much when the conditions for use of the procedure are violated. Using the t Procedures Except in the case of small samples, the condition that the data are an SRS from the population of interest is more important than the condition that the population distribution is Normal. Sample size less than 15: Use t procedures if the data appear close to Normal. If the data are clearly skewed or if outliers are present, do not use t. Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness. Large samples: The t procedures can be used even for clearly skewed distributions when the sample is large, roughly n ≥ 40. 20 One sample t ◦ Proc ttest—gives confidence interval and significance test For matched pairs ◦ Still use similar code as one sample t ◦ Or use the paired command Two-sample problems The two-sample t procedures Robustness of the two-sample t procedures Pooled two-sample t procedures 22 Two-sample problems typically arise from a randomized comparative experiment with two treatment groups. (Experimental study) Comparing random samples separately selected from two populations is also a two-sample problem. (Observational study) Unlike matched pairs design, there is no matching of the units in the two sample, and the two samples may be of different sizes. What if we want to compare the means of some quantitative variable for two populations, Population 1 and Population 2? Our parameters of interest are the population means µ1 and µ2. The best approach is to take separate random samples from each population and to compare the sample means. Suppose we want to compare the average effectiveness of two treatments in a completely randomized experiment. In this case, the parameters µ1 and µ2 are the true mean responses for Treatment 1 and Treatment 2, respectively. We use the mean response from each of the two groups to make the comparison. Here’s a table that summarizes this situation: 24 When data come from two random samples or two groups in a randomized experiment, the statistic x1 - x 2 is our best guess for the value of m1 - m 2. When the two samples are independent of each other, the standard deviation of the statistic x1 - x2 is: s x -x = 1 2 s 12 s 22 n1 + n2 Since we don't know the values of the parameters s 1 and s 2 , we replace them in the standard deviation formula with the sample standard deviations. The result is the standard error of the statistic x1 - x 2 : s12 s2 2 + n1 n 2 We standardize the observed difference to obtain a t statistic that tells us how far the observed difference is from its mean in standard deviation units: t= (x1 - x2 ) s12 s22 + n1 n2 The two-sample t statistic has approximately a t distribution. We can use technology to determine degrees of freedom OR we can use a conservative approach, using the smaller of n1 – 1 and n2 – 1 for the degrees 25 of freedom. Two-Sample t Interval for a Difference Between Means When the Random, Normal, and Independent conditions are met, a level C confidence interval for (m1 - m 2 ) is s12 s22 (x1 - x2 ) ± t * + n1 n2 where t * is the critical value at confidence level C for the t distribution with degrees of freedom either gotten from technology or equal to the smaller of n1 -1 and n2 -1. 27 Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Gender F F F F F F F F F F F F M M M M M M M Mass 36.1 54.6 48.5 42 50.6 42 40.3 33.1 42.4 34.5 51.1 41.2 62 62.9 47.4 48.7 51.9 51.9 46.9 Rate 995 1425 1396 1418 1502 1256 1189 913 1124 1052 1347 1204 1792 1666 1362 1614 1460 1867 1439 n1 12 x1 1235.1 s1 188.3 n2 7 x2 1600 s2 189.2 Find a 95% CI for the difference in mean metabolism rates between men and women. Two-Sample t Test for the Difference Between Two Means Suppose the Random, Normal, and Independent conditions are met. To test the hypothesis H 0 : m1 - m2 = 0, compute the t statistic t= (x1 - x2 ) s12 s22 + n1 n2 Find the P-value by calculating the probability of getting a t statistic this large or larger in the direction specified by the alternative hypothesis H a . Use the t distribution with degrees of freedom approximated by technology or the smaller of n1 -1 and n2 -1. 31 Perform a significance test using α=0.05. The two-sample t procedures are more robust than the one-sample t methods, particularly when the distributions are not symmetric. Using the t Procedures • Except in the case of small samples, the condition that the data are SRSs from the populations of interest is more important than the condition that the population distributions are Normal. • Sum of the sample sizes less than 15: Use t procedures if the data appear close to Normal. If the data are clearly skewed or if outliers are present, do not use t. • Sum of the sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness. • Large samples: The t procedures can be used even for clearly skewed distributions when the sum of the sample sizes is large. 33 df is the min(n1 – 1, n2 – 1) ◦ This choice is conservative. Software will usually give smaller P-values. ◦ More correct. ◦ In our example with metabolism rates, software will give df = 12.6 (12 in Table D) æ s12 s22 ö ç + ÷ class): è n1 n2 ø df = 2 2 2ö 2 ö æ æ 1 s1 1 s2 ç ÷ + ç ÷ n1 -1 è n1 ø n2 -1 è n2 ø 2 ◦ Formula (not needed in our ◦ Most of the time there is no difference in final conclusion… There are two versions of the two-sample t-test: one assuming equal variance (“pooled two-sample test”) and one not assuming equal variance (“unequal” variance, as we have studied) for the two populations. They have slightly different formulas and degrees of freedom. The pooled (equal variance) twosample t-test was often used before computers because it has exactly the t distribution for degrees of freedom n1 + n2 − 2. However, the assumption of equal variances is hard to check, and thus the unequal variance test is safer. Two Normally distributed populations with unequal variances 35 Pooled Two-Sample Procedures Suppose both populations are Normal and they have the same standard deviations. The pooled estimator of σ2 is 2 2 (n -1)s + (n -1)s 1 2 2 s 2p = 1 (n1 + n2 - 2) A level C confidence interval for µ1 − µ2 is ( x1 - x2 ) ± t * s p 1 1 + n1 n2 where the degrees of freedom for t* are n1 + n2 − 2. To test the hypothesis H0: µ1 = µ2 against a one-sided or a two-sided alternative, compute the pooled two-sample t statistic for the t(n1 + n2 − 2) distribution. x1 - x2 t= 1 1 sp + n1 n2 36 Two sample t ◦ Still use similar code as one sample t ◦ Add class command to designate two groups or populations