Download Hypothesis Testing

Two-Sample Hypothesis Testing Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population means is zero. The population standard deviations are known to be  12 and  22 . You have independent samples from the two populations. Their sizes are n1 and n2. The sample mean for the sample from the first population is X1. The mean of X1 equals the mean of the first population , 1 , and the variance of X1 is 2 1 . n1 Similarly, the sample mean for the sample from the 2 nd population is X 2 . The mean of X 2 equals the mean of the 2 nd population ,  2 , and the variance of X 2 is 2 2 n2 . E( X1 - X2 )  1 - 2 . Since X1 and X2 are independen t, V( X1 - X 2 )   12 n1   22 n2 . and the standard deviation of X1 - X 2 is  12 n1   22 n2 . Also X1 - X2 is approximat ely normally distribute d. So we have a standard normal distribution Z (X1  X 2 )  ( 1   2 )  2 1 n1   2 2 n2 We’ll use this formula to test whether the population means are equal. Example Suppose from a large class, we sample 4 grades: 64, 66, 89, 77. From another large class, we sample 3 grades: 56, 71, 53. We assume that the class grades are normally distributed, and that the population variances for the two classes are both 96. Test at the 5% level H 0 : 1  2 versus H1 : 1   2 . Averaging the grades in the first sample, we have X1  74. Averaging the grades in the second sample, we have X2  60. (74  60)  (0) (X1  X 2 )  ( 1   2 )  Then, Z  96 96  12  22   4 3 n1 n 2 14 14   24  32 56 14   1.87 7.4833 As we’ve found before, the Z-values for a two tailed 5% test are 1.96 and -1.96, as indicated below. .475 .475 crit. reg. crit. reg. acceptance region .025 -1.96 .025 0 1.96 Z Since our Z-statistic, 1.87, is in the acceptance region, we accept H0: 1- 2= 0, concluding that the population means are equal. What do you do if you don’t know the population variances in this formula? Replace the population variances with the sample variances and the Z distribution with the t distribution. The number of degrees of freedom is the integer part of this very messy formula: Z (X1  X 2 )  ( 1   2 )  12 n1 t   22 n2 (X1  X 2 )  ( 1   2 ) s12 s22  n1 n 2 2 s s      n1 n2  2 2 2 2  s1   s2       n1    n2  n1  1 n2  1 2 1 2 2 Example Consider the same example as the last one but without the information on the population variances. Again test at the 5% level H 0 : 1  2 versus H1 : 1   2 . Class 1 We need to determine the sample means and sample variances. As before, the sample means are 74 and 60. Class 2 X1 X2 64 56 66 71 89 53 77 296 180 X1  296  74 4 X2  180  60 3 n Remember : the sample variance is s2  2 (X  X )  i i 1 n -1 Class 1 So we subtract the sample mean from each of the grades. Class 2 X1 X1  X1 64 -10 56 -4 66 -8 71 11 89 15 53 -7 77 3 296 X2 X 2  X 2 180 X1  296  74 4 X2  180  60 3 n s2  sample variance Then we square those differences and add them up. 2 (X  X )  i i 1 n -1 Class 1 X1 Class 2 X1  X1 (X1  X1 ) 2 X2 X 2  X 2 (X 2  X2 ) 2 64 -10 100 56 -4 16 66 -8 64 71 11 121 89 15 225 53 -7 49 77 3 9 296 398 X1  296  74 4 180 186 X2  180  60 3 n s2  sample variance Then we divide that sum by n-1 to get the sample variance. 2 (X  X )  i i 1 n -1 Class 1 X1 Class 2 X1  X1 (X1  X1 ) 2 X2 X 2  X 2 (X 2  X2 ) 2 64 -10 100 56 -4 16 66 -8 64 71 11 121 89 15 225 53 -7 49 77 3 9 296 398 296  74 4 398 s12   132.67 3 X1  180 186 180  60 3 186 s 22   93.0 2 X2  What are the dof & critical t value? 2 2 Since we have: s1  132.67, s2  93.0, n1  4, and n2  3, our very messy dof formula yields 2 2  s12 s22   132.67 93.0        4 3   n1 n2    = 4.860 2 2 2 2 2 2  s1   s2   132.67   93.0          4 3      n1    n2  3 2 n1  1 n2  1 So the degrees of freedom is the integer part of 4.86 or 4. 0.95 0.025 For a 5% two-tailed test & 4 dof, the t value is 2.7764 . -2.7764 0.025 0 2.7764 t4 Next we need to compute our test statistic. Then, t 4  n1  4; X1  74; s12  132.67 n 2  3; X2  60; s 22  93.0 (X1  X 2 )  ( 1  2 ) s12 s22  n1 n2 Since our t-value, 1.748, is in the acceptance region, we accept H0: 1 = 2 (74  60)  (0)   1.748 132.67 93  4 3 0.95 0.025 -2.7764 0.025 0 2.7764 t4 Sometimes we don’t know the population variances, but we believe that they are equal. So we need to compute an estimate of the common variance, which we do by pooling our information from the two samples. We denote the pooled sample variance by sp2. sp2 is a weighted average of the two sample variances, with more weight put on the sample variance that was based on the larger sample. If the two samples are the same size, sp2 is just the sum of the two sample variances, divided by two. In general, 2 2 (n  1 ) s  (n  1 ) s 1 2 2 s 2p  1 n1  n 2  2 Let’s return for a moment to the statistic that we used to compare population means when the population variances were known. Since the population variances 2 2  12 n1 Z  and  are believed to be 2 1 Z (X1  X 2 )  ( 1   2 ) 2 equal, let' s denote them both by  . n1 Then we can factor out the 2 and replace the 2 by sp2 and the Z by t. The number of degrees of freedom is n1 + n2 -2. n2 (X1  X 2 )  ( 1   2 ) 2 Z   22  2 n2 (X1  X 2 )  ( 1   2 ) 1 1       n1 n 2  2 t (X1  X 2 )  ( 1   2 ) 1 1  s     n1 n 2  2 p Let’s do the previous example again, but this time assume that the unknown population variances are believed to be equal. We had: n 2  3; X2  60; s 22  93.0 n1  4; X1  74; s12  132.67; 2 2 (n  1 ) s  (n  1 ) s (3)(132.67)  (2)(93.0) 1 2 2 s 2p  1   116.8 n1  n 2  2 43 2 t (X1  X 2 )  ( 1   2 ) 1 1  s     n1 n 2  2 p  (74  60)  (0) 1 1 116.8    4 3  1.70 The number of degrees of freedom is n1 + n2 -2, and we are doing a 2-tailed test at the 5% level. Since our t-statistic 1.70 is in the acceptance region, we accept H0: 1 = 2. crit. reg. .025 -2.571 Acceptance region 0 crit. reg. .025 2.571 t5 In the previous three hypothesis tests, we tested whether 2 populations has the same mean, when we had 2 independent samples. We can’t use those tests, however, if the 2 samples are not independent. For example, suppose you are looking at the weights of people, before and after a fitness program. Since the weights are for the same group of people, the before and after weights are not independent of each other. In this type of situation, we can use a hypothesis test based on matched-pairs samples. The hypotheses are H 0 :   0 and H1 :   0, where  is the population mean difference . The test statistic is D- t n 1  sD n where D is the sample mean difference , s D is the sample standard deviation of the difference s, and n is the number of pairs of observatio ns. Example Before and after a fitness program, the following sample of weights is observed. Test at the 5% level whether t he program causes a weight change, i.e. : H 0 :   0 versus H1 :   0 . person Before After 1 168 160 2 195 197 3 155 150 4 183 180 5 169 163 D = A-B 2 D  D (D  D) First we calculate the weight differences. person Before After D = A-B 1 168 160 -8 2 195 197 2 3 155 150 -5 4 183 180 -3 5 169 163 -6 2 D  D (D  D) Then we add up the differences and determine the mean. person Before After D = A-B 1 168 160 -8 2 195 197 2 3 155 150 -5 4 183 180 -3 5 169 163 -6 -20 D - 20  4 5 2 D  D (D  D) Next we need to calculate the sample standard deviation for the weight differences. The sample standard deviation is s D   (D  D ) 2 n 1 person Before After D = A-B 1 168 160 -8 2 195 197 2 3 155 150 -5 4 183 180 -3 5 169 163 -6 -20 D - 20  4 5 2 D  D (D  D) We subtract the mean difference from each of the D values. 2 D  D (D  D) person Before After D = A-B 1 168 160 -8 -4 2 195 197 2 6 3 155 150 -5 -1 4 183 180 -3 1 5 169 163 -6 -2 -20 D - 20  4 5 We square the values in that column, and add up the squares. 2 D  D (D  D) person Before After D = A-B 1 168 160 -8 -4 16 2 195 197 2 6 36 3 155 150 -5 -1 1 4 183 180 -3 1 1 5 169 163 -6 -2 4 -20 D - 20  4 5 58 Then since sD   (D  D ) n 1 2 , we divide by n-1 = 4, and take the square root. 2 D  D (D  D) person Before After D = A-B 1 168 160 -8 -4 16 2 195 197 2 6 36 3 155 150 -5 -1 1 4 183 180 -3 1 1 5 169 163 -6 -2 4 -20 D - 20  4 5 58 sD  58  14.5  3.81 4 Next we assemble our statistic. D- t n 1  sD n -4-0   - 2.35 3.81 5 2 D  D (D  D) person Before After D = A-B 1 168 160 -8 -4 16 2 195 197 2 6 36 3 155 150 -5 -1 1 4 183 180 -3 1 1 5 169 163 -6 -2 4 -20 D - 20  4 5 58 sD  58  14.5  3.81 4 Since we had 5 people and 5 pairs of weights, n=5, and the number of degrees of freedom is n-1 = 4. We’re doing a 2-tailed t-test at the 5% level, so the critical region looks like this: crit. reg. crit. reg. Acceptance region .025 -2.776 .025 0 2.776 t4 Since our t-statistic, -2.35, is in the acceptance region, we accept the null hypothesis that the program would cause no average weight change for the population as a whole. Hypothesis tests on the difference between 2 population proportions, using independent samples H0 : 1 - 2  0 versus H1: 1   2 If you look at the statistics we have used in our hypothesis tests, you will notice that they have a common form: (point estimate) - (mean of the point estimate) std dev, or estimate of the std dev, of the pt estimate In our hypothesis tests on the difference between 2 population proportions, we are going to use that same form. The point estimate is p1 - p2 , which is the difference in the sample proportions. The mean of the point estimate is 1 - 2 , which is the population proportion. We still need to determine the standard deviation, or an estimate of the standard deviation, of our point estimate. We start with V(p1 -p2 ). Under our assumption that the samples are independent,  (1-1 )  2 (1- 2 ) V(p1 -p2 )  V(p1 )  V(p2 )  1  . n1 n2 According to the null hypothesis,  1   2 , so we'll call them both  . So, V(p1 -p2 )   (1   ) n1   (1- ) n2  1 1    (1- )    n2   n1  1 1  We have V(p1 -p2 )    (1- )   , n2   n1 but we don't know what  is. We need to estimate the hypothetically common value of  . Let X1 be the number of "successes" in the 1st sample, which is of size n1 , and X 2 be the number of "successes" in the 2nd sample, which is of size n 2 . Our estimate of the common value for  will be the proportion of successes in the combined sample or p  X1  X 2 . n1  n 2  1 1  So our estimate of V(p1 - p2 ) is p(1-p)    , and our estimate  n1 n 2  of the standard deviation of p1 - p2 is the square root of that expression. Assembling the pieces, we have Z  (p1 - p 2 ) - (1 - 2 )  1 1  p(1-p)     n1 n 2  where X1  X 2 p n1  n 2 . Suppose the proportions of Democrats in samples of 100 and 225 from 2 states are 33% and 20%. Test at the 5% level the hypothesis that the proportion of Democrats in the populations of the 2 states H0 : 1 - 2  0 versus H1: 1 - 2  0. are equal. The number of Democrats in the first sample is (.33)(100)  33, and the number in the second sample is (.20)(225)  45. So the proportion in the combined sample is 33  45 78 p   0.24 , and 100  225 325 Z  (p1 - p 2 ) - ( 1 - 2 )  1 1  p(1-p)     n1 n 2   1-p  .76. (.33 - .20) - (0) 1   1 (.24)(.76)     100 225   2.53 We’re doing a 2-tailed Z-test at the 5% level, so the critical region looks like this: crit. reg. crit. reg. Acceptance region .025 -1.96 .025 0 1.96 Z Since our Z-statistic, 2.53, is in the critical region, we reject the null hypothesis and accept the alternative that the proportions of Democrats in the 2 states are different. Sometimes you want to test whether two independent samples have the same variance. If the populations are normally distributed, we can use the F-statistic to perform the test. The F-statistic is Fn1 1, n2 1 where s  2 1 n1  i 1 s2  2 n2  i 1 2 1 2 2 s  s ( X 1i  X 1 ) 2 is the sample variance for the first sample, n1  1 ( X 2i  X 2 ) 2 is the sample variance for the second sample, n2  1 and s12 is the larger of the two sample variances. Notice that, because s12 is larger than s 22 , this F-statistic will always be greater than 1. So our critical region will always just be the upper tail. This F-statistic has n1-1 degrees of freedom for the numerator, and n2-1 degrees of freedom for the denominator. The distribution of our F-statistic, Fn1 1, n2 1 s12  2 , s2 with the tail for the critical region looks like this: f(F) acceptance region critical region Fn1 1, n2 1 Two-sided versus one-sided tests for equality of variance While you are always using the upper tail of the F-test on tests of equality of variance, the size of the critical region you sketch varies with whether you have a two-sided or a one-sided test. Let’s see why this is true. For a two-sided test, we have: H0 : 12   22 versus H1 : 12   22 . While, for our samples, the sample variance from the first group was greater, s12  s22 , our alternative hypothesis indicates that we think that the population variance could have been larger or smaller for the first population: 12   22 or 12   22 . Our sketch of the critical region is based on the situation in which the variance is greater for the first group, but we admit that, if we had information for the entire population, we might find that the situation is reversed. So there is an implicit second sketch of an F-statistic in which the sample variance of the second group is in the numerator. Thus, for each of the sketches, the sketch we draw and the implicit sketch, the area of the critical region is α/2, half of the test level α. So, for example, if you are doing a two-sided test at the 5% level, your sketch will show a tail area of 0.025. What if we are performing a one-sided test? H0 : 12   22 versus H1 : 12   22 Now we are looking at a situation in which the sample variance is again larger for the first group. This time however, we want to know if, in fact, the population variance is really larger for the first group. So we have the one-sided alternative shown above. Keep in mind that, as usual with one-sided tests, the null hypothesis is the devil’s advocate view. Here the devil’s advocate is saying: nah, the population variance for the first group isn’t really any larger than for the second group. For a one-sided test with level α, your critical region will have area α. For example, if you are performing a one-sided test at the 5% level, the critical region will have area 0.05. Example: You are looking at test results for two groups of students. There are 25 students in the first group, for which you have calculated the sample variance to be 15. There are 30 students in the second group, for which you have calculated the sample variance to be 10. Test at the 10% level whether the populations variances are the same. F24, 29 2 1 2 2 s  s 15   1.5 10 f(F) Because 1.5 is in the acceptance region, you cannot reject the null hypothesis and you conclude that the variances of the two populations are the same. There are 25-1 = 24 degrees of freedom in the numerator and 30-1=29 degrees of freedom in the denominator. This is a two-sided test, so the critical region has area 0.05. acceptance region critical region 1.90 0.05 F24, 29 In the two sections we have just completed, we did 9 different types of hypothesis tests. 1. 2. 3. 4. 5. 6. 7. 8. 9. population mean - 1 sample - known population variance population mean - 1 sample - unknown population variance population proportion - 1 sample difference in population means - 2 independent samples known population variances difference in population means - 2 independent samples unknown population variances difference in population means - 2 independent samples unknown population variances that are believed to be equal difference in population means - 2 dependent samples difference in population proportions - 2 independent samples Difference in population variances - 2 independent samples The statistics for these tests are compiled on a summary sheet which is available at my web site.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Hypothesis Testing