Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing Two Population Means The Two-Sample T-Test and T-Interval Example Do male and female college students differ with respect to their fastest reported driving speed? Population of all male college students Sample of n1 = 17 males report average of 102.1 mph Population of all female college students Sample of n2 = 21 females report average of 85.7 mph Comparative Observational Study • A research study in which two or more groups are compared with respect to some measurement or response. • The groups, determined by their natural characteristics, are merely “observed.” Graphical summary of sample data Gender male female 75 85 95 105 115 125 Fastest Driving Speed (mph) 135 145 Numerical summary of sample data Gender female male Gender female male N 21 17 Mean 85.71 102.06 SE Mean 2.05 4.14 Median TrMean 85.00 85.26 100.00 101.00 Minimum 75.00 75.00 StDev 9.39 17.05 Maximum Q1 Q3 105.00 77.50 92.50 145.00 90.00 115.00 The difference in the sample means is 102.06 - 85.71 = 16.35 mph The Question in Statistical Notation Let M = the average fastest speed of all male students. and F = the average fastest speed of all female students. Then we want to know whether M F. This is equivalent to knowing whether M - F 0 All possible questions in statistical notation In general, we can always compare two averages by seeing how their difference compares to 0: This comparison… is equivalent to … 1 2 1 - 2 0 1 > 2 1 - 2 > 0 1 < 2 1 - 2 < 0 Set up hypotheses • Null hypothesis: – H0: M = F [equivalent to M - F = 0] • Alternative hypothesis: – Ha: M F [equivalent to M - F 0] Make initial assumption • Assume null hypothesis is true. • That is, assume M = F • Or, equivalently, assume M - F = 0 Determine the P-value • P-value = “How likely is it that our sample means would differ by as much as 16.35 m.p.h. if the difference in population means really is 0?” • The P-value, 0.001, is small. Our sample result is not likely if the null hypothesis is true. • Reject the null hypothesis. Make a decision • There is sufficient evidence, at the 0.05 level of significance, to conclude that the average reported fastest driving speed of all male college students differs from the average reported fastest driving speed of all female students. How the P-value is calculated The P-value is determined by standardizing, that is, by calculating the two-sample test statistic... t difference in sample means hypothesiz ed difference standard error of the difference …and comparing the value of the test statistic to the appropriate sampling distribution. The sampling distribution depends on how you estimate the standard error of the differences. If variances of the measurements of the two groups are not equal... Estimate the standard error of the difference as: s12 s22 n1 n 2 Then the sampling distribution is an approximate t distribution with a complicated formula for d.f. If variances of the measurements of the two groups are equal... Estimate the standard error of the difference using the common pooled variance: 2 s2p n1 n1 1 where 2 (n 1)s2 (n 1)s 1 2 2 s2p 1 n1 n 2 2 Then the sampling distribution is a t distribution with n1+n2-2 degrees of freedom. Assume variances are equal only if neither sample standard deviation is more than twice that of the other sample standard deviation. Two-sample t-test in Minitab • Select Stat. Select Basic Statistics. • Select 2-sample t to get a Pop-Up window. • Click on the radio button before Samples in one Column. Put the measurement variable in Samples box, and put the grouping variable in Subscripts box. • Specify your alternative hypothesis. • If appropriate, select Assume Equal Variances. • Select OK. Pooled two-sample t-test Two sample T for Fastest Gender female male N 21 17 Mean 85.71 102.1 StDev 9.39 17.1 SE Mean 2.0 4.1 95% CI for mu (female) - mu (male ): ( -25.2, -7.5) T-Test mu (female) = mu (male ) (vs not =): T = -3.75 P = 0.0006 DF = 36 Both use Pooled StDev = 13.4 (Unpooled) two-sample t-test Two sample T for Fastest Gender female male N 21 17 Mean 85.71 102.1 StDev 9.39 17.1 SE Mean 2.0 4.1 95% CI for mu (female) - mu (male ): ( -25.9, -6.8) T-Test mu (female) = mu (male ) (vs not =): T = -3.54 P = 0.0017 DF = 23 Assumptions for correct P-values • Data in each group follow a normal distribution. • If use pooled t-test, the variances for each group are equal. • The samples are independent. That is, who is in the second sample doesn’t depend on who is in the first sample (and vice versa). Confidence interval for difference in two means We can be “such-and-such” confident that the difference in the population means falls in the interval... difference in sample means (t* standard error) where the t* multiplier depends on the confidence level and is obtained either from the appropriate t distribution. Interpreting a confidence interval for the difference in two means… If the confidence interval contains… zero only positive numbers only negative numbers then, we conclude … the two means may not differ first mean is larger than second mean first mean is smaller than second mean Two-sample confidence interval in Minitab • Select Stat. Select Basic Statistics. • Select 2-sample t to get a Pop-Up window. • Click on the radio button before Samples in one Column. Put the measurement variable in Samples box, and put the grouping variable in Subscripts box. • Specify confidence level. • If appropriate, select Assume Equal Variances. • Select OK. Example Two sample T for laundry gender M F N 44 44 Mean 3.07 3.89 StDev 1.81 3.88 SE Mean 0.27 0.58 95% CI for mu (M) - mu (F): ( -2.11, 0.47) T-Test mu (M) = mu (F) (vs not =): T = -1.27 P = 0.21 DF = 60 Example Do the average guesses of the population of Turkey differ depending on preliminary information received? Population of all people seeing “80 million” Sample of n1 = 34 people Population of all people seeing “10 million” Sample of n2 = 33 people Randomized comparative experiment • A study in which two or more groups are randomly assigned to a “treatment” to see how the treatment affects some “response.” • If each “experimental unit” has the same chance of receiving any treatment, then the experiment is called a “completely randomized design.” Graphical summary of data Form 80 10 0 100 200 Guess of Population of Turkey 300 Two-sample t-test results Two sample T for Turkey Form2 10 80 N 33 34 Mean 12.50 62.8 StDev 8.50 54.8 SE Mean 1.5 9.4 95% CI for mu (10) - mu (80): ( -69.6, -30.9) T-Test mu (10) = mu (80) (vs <): T = -5.28 P = 0.0000 DF = 34 Conclusions of Turkey experiment • There is sufficient evidence, at the 0.05 level, to conclude that the average guesses of the population of Turkey differ between the two forms. • The population mean guess of the “10 million” form is lower than the population mean guess of the “80 million” form.