Download Lecture 19

Comparing Populations Proportions and means The sampling distribution of differences of Normal Random Variables If X and Y denote two independent normal random variables, then : D = X – Y is normal with mean D   X  Y standard deviation  D   X2   Y2 Comparing proportions Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to compare the two population proportions Consider the statistic: x1 x2 D  pˆ1  pˆ 2 = n1 n2 This statistic has a normal distribution D   pˆ   pˆ  p1  p2 1 z  pˆ1  pˆ 2  pˆ  pˆ 1 2  2 pˆ1  pˆ 2 pˆ1 1  pˆ1  pˆ 2 1  pˆ 2   n1 n1 Consider the statistic: x1 x2 D  pˆ1  pˆ 2 = n1 n2 This statistic has a normal distribution with D   pˆ   pˆ  p1  p2 1 2  D = pˆ  pˆ   p2ˆ   p2ˆ 1 2 1 2  p1 1  p1  p2 1  p2   n1 n1  pˆ1 1  pˆ1  pˆ 2 1  pˆ 2   n1 n2 Thus z  D  D D  pˆ1  pˆ 2 -  p1  p2   pˆ  pˆ 1   2 pˆ1  pˆ 2 -  p1  p2  p1 1  p1  p2 1  p2   n1 n1 pˆ1  pˆ 2 -  p1  p2  pˆ1 1  pˆ1  pˆ 2 1  pˆ 2   n1 n1 Has a standard normal distribution We want to test either: 1. H 0 : p1  p2 vs H A : p1  p2 or 2. H 0 : p1  p2 vs H A : p1  p2 or 3. H 0 : p1  p2 vs H A : p1  p2 If p1 = p2 (p say) then the test statistic: z  D  D D  pˆ1  pˆ 2 -  p1  p2   pˆ  pˆ 1   2 pˆ1  pˆ 2 -  p1  p2  p1 1  p1  p2 1  p2   n1 n2 pˆ1  pˆ 2 1 1 p 1  p      n1 n2   pˆ1  pˆ 2 1 1 pˆ 1  pˆ      n1 n2  has a standard normal distribution. where x1  x2 pˆ  n1  n2 is an estimate of the common value of p1 and p2. Thus for comparing two binomial probabilities p1 and p2 The test statistic z pˆ1  pˆ 2 1 1 pˆ 1  pˆ      n1 n2  where x1 x2 pˆ1  , pˆ 2  n1 n2 x1  x2 and pˆ  n1  n2 The Critical Region The Alternative Hypothesis HA The Critical Region H A : p1  p2 z   z / 2 or z  z / 2 H A : p1  p2 z  z H A : p1  p2 z   z Example • In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n1 = 1067 male nonsmoking pensioners were observed for a five-year period. • In addition a sample of n2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. • At the end of the five-year period, x1 = 117 of the nonsmoking pensioners had died while x2 = 54 of the pipe-smoking pensioners had died. • Is there a the mortality rate for pipe smokers higher than that for non-smokers We want to test: H 0 : p1  p2 vs H A : p1  p2 The test statistic: z pˆ1  pˆ 2  pˆ  pˆ 1 2  pˆ1  pˆ 2 1 1 pˆ 1  pˆ     n1 n2  Note: x1 117 pˆ1    0.1097 n1 1067 x2 54 pˆ 2    0.1343 n2 402 x1  x2 117  54 pˆ   n1  n2 1067  402 171   0.1164 1469 The test statistic: z   pˆ1  pˆ 2 1 1 pˆ 1  pˆ     n1 n2  0.1097  .1343 1   1 0.11641  0.1164     1067 402   1.315 We reject H0 if: z   z  -z0.05  1.645 Not true hence we accept H0. Conclusion: There is not a significant ( = 0.05) increase in the mortality rate due to pipe-smoking Estimating a difference proportions using confidence intervals Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to estimate the difference in the two population proportions d = p1 – p2. Confidence Interval for d 100P% = 100(1 – ) % : = p1 – p2 pˆ1  pˆ 2  z / 2  pˆ1  pˆ 2 pˆ1  pˆ 2  z / 2 pˆ1 1  pˆ1  pˆ 2 1  pˆ 2   n1 n2 Example • Estimating the increase in the mortality rate for pipe smokers higher over that for nonsmokers d = p2 – p1 pˆ1 1  pˆ1  pˆ 2 1  pˆ 2  pˆ 2  pˆ1  z / 2  n1 n2 0.10971  0.1097 0.13431  0.1343 0.1343  0.1097  1.960  1067 0.0247  0.0382  0.0136 to 0.0629  1.36% to 6.29% 402 Comparing Means Situation • We have two normal populations (1 and 2) • Let 1 and 1 denote the mean and standard deviation of population 1. • Let 2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : 1  2 vs H A : 1  2 or 2. H 0 : 1  2 vs H A : 1  2 or 3. H 0 : 1  2 vs H A : 1  2 Consider the test statistic: z xy   xy xy    2 x xy  2 1 n   2 2 m  2 y xy 2 x 2 y s s  n m H 0 : 1  2 is true If: z xy  2 1 n   2 2 m  xy 2 x 2 y s s  n m • will have a standard Normal distribution • This will also be true for the approximation (obtained by replacing 1 by sx and 2 by sy) if the sample sizes n and m are large (greater than 30) Note: n n x x i 1 i n sx  y i 1 m i i 1 i n 1 n n y  x  x  2 sy  2   y  y  i i 1 m 1 The Alternative Hypothesis HA The Critical Region H A : 1  2 z   z / 2 or z  z / 2 H A : 1  2 z  z H A : 1  2 z   z Example • A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. • For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. • A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. • After a period of one year the reduction in blood pressure was measured for each patient in the study. We want to test: H 0 : 1  2 The exercise group did not have a higher average reduction in blood pressure vs H A : 1  2 The exercise group did have a higher average reduction in blood pressure The test statistic: z xy   xy xy    2 x xy  2 1 n   2 2 m  2 y xy 2 x 2 y s s  n m Suppose the data has been collected and: n n x x i 1 i n  10.67 sx   x  x  y i 1 m i i 1 n 1 n n  yi 2  7.83 sy  y i 1 i  3.895  y m 1 2  4.224 The test statistic: z xy 2 x 2 y s s  n m  10.67  7.83 3.895 2 500  4.224   2.84   10.4 0.273765 2 400 We reject H0 if: z  z  z0.05  1.645 True hence we reject H0. Conclusion: There is a significant ( = 0.05) effect due to the exercise regime on the reduction in Blood pressure Estimating a difference means using confidence intervals Situation • We have two populations (1 and 2) • Let 1 denote the mean of population 1. • Let 2 denote the mean of population 2. • Objective is to estimate the difference in the two population proportions d = 1 – 2. Confidence Interval for d = 1 – 2 ˆ1  ˆ 2  z / 2  ˆ ˆ 1 x  y  z / 2 2 x 2 2 y s s  n m Example • Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = 1 – 2 x  y  z / 2 2 x 2 y s s  n m  3.895 10.67  7.83  1.960 2 500 2.84  1.96(.273765) 2.84  0.537 2.303 to 3.337  4.224   2 400 Comparing Means – small samples Situation • We have two normal populations (1 and 2) • Let 1 and 1 denote the mean and standard deviation of population 1. • Let 2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : 1  2 vs H A : 1  2 or 2. H 0 : 1  2 vs H A : 1  2 or 3. H 0 : 1  2 vs H A : 1  2 Consider the test statistic: z  xy  xy xy    2 x xy  2 1 n   2 2 m  2 y xy 2 x 2 y s s  n m If the sample sizes (m and n) are large the statistic t xy 2 x 2 y s s  n m will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small The t test – for comparing means – small samples (equal variances) Situation • We have two normal populations (1 and 2) • Let 1 and  denote the mean and standard deviation of population 1. • Let 2 and  denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. 1 = 2 =  Let n n x x i 1 i n sx  y i 1 m i i 1 i n 1 n n y  x  x  2 sy  2   y  y  i i 1 m 1 The pooled estimate of . Note: both sx and sy are estimators of . These can be combined to form a single estimator of , sPooled. sPooled  n  1sx2  m  1s 2y nm2 The test statistic: xy t s 2 Pooled n  s 2 Pooled m xy  1 1 sPooled  n m If 1 = 2 this statistic has a t distribution with n + m –2 degrees of freedom The Alternative Hypothesis HA The Critical Region H A : 1  2 t  t / 2 or t  t / 2 H A : 1  2 t  t H A : 1  2 t  t t / 2 and t are critical points under the t distribution with degrees of freedom n + m –2. Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period We want to test: H 0 : 1  2 The treated group did not have a lower average final tumour size. vs H A : 1  2 The exercise group did have a lower average final tumour size. The test statistic: xy t 1 1 sPooled  n m Suppose the data has been collected and: drug treated untreated 1.89 2.08 1.79 1.28 1.29 1.75 n x  xi n  1.657 i 1 n sx  n y y i 1 m 1.90 i 2.32  x  x  i 1 2.16 2 i n 1  0.3215 n  1.915 sy  2   y  y  i i 1 m 1  0.3693 The test statistic: sPooled  n  1sx2  m  1s 2y nm2 20.3215  50.3693  0.3563 7 2  2 1.657  1.915  .258 t   1.025 .252 1 1 0.3563  3 6 We reject H0 if: t  t   t0.05  1.895 with d.f. = n + m – 2 = 7 Hence we accept H0. Conclusion: The drug treatment does not result in a significant ( = 0.05) smaller final tumour size, Summary of Tests One Sample Tests Situation Test Statistic Sample form the Normal distribution with unknown mean and known variance (Testing ) z Sample form the Normal distribution with unknown mean and unknown variance (Testing ) Testing of a binomial probability  Sample form the Normal distribution with unknown mean and unknown variance (Testing ) t z n x  0  H0    0 n x   0  s pˆ  p0 p0 (1  p0 ) n  n  1s 2 U  02      p = p0  0 HA                   p ≠p0  p >p0  p0 p <   0 Critical Region z < -z/2 or z > z/2 z > z z <-z t < -t/2 or t > t/2 t > t t < -t z < -z/2 or z > z/2 z > z z < -z U   12 / 2 n  1 or   0 U   2 n  1   0 U   12 n  1 U   2 / 2 n  1 Two Sample Tests Situation Two independent samples from the Normal distribution with unknown means and known variances (Testing 1 - 2) Test Statistic  x1  x2  z  12 n1  H0 HA Critical Region 1   2 1   2 z < -z/2 or z > z/2  22 1   2 z > z n2 1   2 z < -z Two independent samples from the Normal distribution with unknown means and unknown but equal variances. (Testing 1 - 2) t x1  x2  sp 1   2 1   2 t < -t/2 or t > t/2 1 1  n1 n2 1   2 t > t 1   2 t < -t Estimation of a the difference between two binomial probabilities, p1-p2 z ˆ1  ˆ2 1 1   n  1 n2  ˆ (1  ˆ )  I am using  instead of p. 1   2 1   2 z < -z/2 or z > z/2  1   2 z > z 1   2 z < -z

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 19