Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing Populations Proportions and means The sampling distribution of differences of Normal Random Variables If X and Y denote two independent normal random variables, then : D = X – Y is normal with mean D X Y standard deviation D X2 Y2 Comparing proportions Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to compare the two population proportions Consider the statistic: x1 x2 D pˆ1 pˆ 2 = n1 n2 This statistic has a normal distribution D pˆ pˆ p1 p2 1 z pˆ1 pˆ 2 pˆ pˆ 1 2 2 pˆ1 pˆ 2 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n1 Consider the statistic: x1 x2 D pˆ1 pˆ 2 = n1 n2 This statistic has a normal distribution with D pˆ pˆ p1 p2 1 2 D = pˆ pˆ p2ˆ p2ˆ 1 2 1 2 p1 1 p1 p2 1 p2 n1 n1 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n2 Thus z D D D pˆ1 pˆ 2 - p1 p2 pˆ pˆ 1 2 pˆ1 pˆ 2 - p1 p2 p1 1 p1 p2 1 p2 n1 n1 pˆ1 pˆ 2 - p1 p2 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n1 Has a standard normal distribution We want to test either: 1. H 0 : p1 p2 vs H A : p1 p2 or 2. H 0 : p1 p2 vs H A : p1 p2 or 3. H 0 : p1 p2 vs H A : p1 p2 If p1 = p2 (p say) then the test statistic: z D D D pˆ1 pˆ 2 - p1 p2 pˆ pˆ 1 2 pˆ1 pˆ 2 - p1 p2 p1 1 p1 p2 1 p2 n1 n2 pˆ1 pˆ 2 1 1 p 1 p n1 n2 pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 has a standard normal distribution. where x1 x2 pˆ n1 n2 is an estimate of the common value of p1 and p2. Thus for comparing two binomial probabilities p1 and p2 The test statistic z pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 where x1 x2 pˆ1 , pˆ 2 n1 n2 x1 x2 and pˆ n1 n2 The Critical Region The Alternative Hypothesis HA The Critical Region H A : p1 p2 z z / 2 or z z / 2 H A : p1 p2 z z H A : p1 p2 z z Example • In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n1 = 1067 male nonsmoking pensioners were observed for a five-year period. • In addition a sample of n2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. • At the end of the five-year period, x1 = 117 of the nonsmoking pensioners had died while x2 = 54 of the pipe-smoking pensioners had died. • Is there a the mortality rate for pipe smokers higher than that for non-smokers We want to test: H 0 : p1 p2 vs H A : p1 p2 The test statistic: z pˆ1 pˆ 2 pˆ pˆ 1 2 pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 Note: x1 117 pˆ1 0.1097 n1 1067 x2 54 pˆ 2 0.1343 n2 402 x1 x2 117 54 pˆ n1 n2 1067 402 171 0.1164 1469 The test statistic: z pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 0.1097 .1343 1 1 0.11641 0.1164 1067 402 1.315 We reject H0 if: z z -z0.05 1.645 Not true hence we accept H0. Conclusion: There is not a significant ( = 0.05) increase in the mortality rate due to pipe-smoking Estimating a difference proportions using confidence intervals Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to estimate the difference in the two population proportions d = p1 – p2. Confidence Interval for d 100P% = 100(1 – ) % : = p1 – p2 pˆ1 pˆ 2 z / 2 pˆ1 pˆ 2 pˆ1 pˆ 2 z / 2 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n2 Example • Estimating the increase in the mortality rate for pipe smokers higher over that for nonsmokers d = p2 – p1 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 pˆ 2 pˆ1 z / 2 n1 n2 0.10971 0.1097 0.13431 0.1343 0.1343 0.1097 1.960 1067 0.0247 0.0382 0.0136 to 0.0629 1.36% to 6.29% 402 Comparing Means Situation • We have two normal populations (1 and 2) • Let 1 and 1 denote the mean and standard deviation of population 1. • Let 2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : 1 2 vs H A : 1 2 or 2. H 0 : 1 2 vs H A : 1 2 or 3. H 0 : 1 2 vs H A : 1 2 Consider the test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m H 0 : 1 2 is true If: z xy 2 1 n 2 2 m xy 2 x 2 y s s n m • will have a standard Normal distribution • This will also be true for the approximation (obtained by replacing 1 by sx and 2 by sy) if the sample sizes n and m are large (greater than 30) Note: n n x x i 1 i n sx y i 1 m i i 1 i n 1 n n y x x 2 sy 2 y y i i 1 m 1 The Alternative Hypothesis HA The Critical Region H A : 1 2 z z / 2 or z z / 2 H A : 1 2 z z H A : 1 2 z z Example • A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. • For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. • A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. • After a period of one year the reduction in blood pressure was measured for each patient in the study. We want to test: H 0 : 1 2 The exercise group did not have a higher average reduction in blood pressure vs H A : 1 2 The exercise group did have a higher average reduction in blood pressure The test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m Suppose the data has been collected and: n n x x i 1 i n 10.67 sx x x y i 1 m i i 1 n 1 n n yi 2 7.83 sy y i 1 i 3.895 y m 1 2 4.224 The test statistic: z xy 2 x 2 y s s n m 10.67 7.83 3.895 2 500 4.224 2.84 10.4 0.273765 2 400 We reject H0 if: z z z0.05 1.645 True hence we reject H0. Conclusion: There is a significant ( = 0.05) effect due to the exercise regime on the reduction in Blood pressure Estimating a difference means using confidence intervals Situation • We have two populations (1 and 2) • Let 1 denote the mean of population 1. • Let 2 denote the mean of population 2. • Objective is to estimate the difference in the two population proportions d = 1 – 2. Confidence Interval for d = 1 – 2 ˆ1 ˆ 2 z / 2 ˆ ˆ 1 x y z / 2 2 x 2 2 y s s n m Example • Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = 1 – 2 x y z / 2 2 x 2 y s s n m 3.895 10.67 7.83 1.960 2 500 2.84 1.96(.273765) 2.84 0.537 2.303 to 3.337 4.224 2 400 Comparing Means – small samples Situation • We have two normal populations (1 and 2) • Let 1 and 1 denote the mean and standard deviation of population 1. • Let 2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : 1 2 vs H A : 1 2 or 2. H 0 : 1 2 vs H A : 1 2 or 3. H 0 : 1 2 vs H A : 1 2 Consider the test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m If the sample sizes (m and n) are large the statistic t xy 2 x 2 y s s n m will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small The t test – for comparing means – small samples (equal variances) Situation • We have two normal populations (1 and 2) • Let 1 and denote the mean and standard deviation of population 1. • Let 2 and denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. 1 = 2 = Let n n x x i 1 i n sx y i 1 m i i 1 i n 1 n n y x x 2 sy 2 y y i i 1 m 1 The pooled estimate of . Note: both sx and sy are estimators of . These can be combined to form a single estimator of , sPooled. sPooled n 1sx2 m 1s 2y nm2 The test statistic: xy t s 2 Pooled n s 2 Pooled m xy 1 1 sPooled n m If 1 = 2 this statistic has a t distribution with n + m –2 degrees of freedom The Alternative Hypothesis HA The Critical Region H A : 1 2 t t / 2 or t t / 2 H A : 1 2 t t H A : 1 2 t t t / 2 and t are critical points under the t distribution with degrees of freedom n + m –2. Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period We want to test: H 0 : 1 2 The treated group did not have a lower average final tumour size. vs H A : 1 2 The exercise group did have a lower average final tumour size. The test statistic: xy t 1 1 sPooled n m Suppose the data has been collected and: drug treated untreated 1.89 2.08 1.79 1.28 1.29 1.75 n x xi n 1.657 i 1 n sx n y y i 1 m 1.90 i 2.32 x x i 1 2.16 2 i n 1 0.3215 n 1.915 sy 2 y y i i 1 m 1 0.3693 The test statistic: sPooled n 1sx2 m 1s 2y nm2 20.3215 50.3693 0.3563 7 2 2 1.657 1.915 .258 t 1.025 .252 1 1 0.3563 3 6 We reject H0 if: t t t0.05 1.895 with d.f. = n + m – 2 = 7 Hence we accept H0. Conclusion: The drug treatment does not result in a significant ( = 0.05) smaller final tumour size, Summary of Tests One Sample Tests Situation Test Statistic Sample form the Normal distribution with unknown mean and known variance (Testing ) z Sample form the Normal distribution with unknown mean and unknown variance (Testing ) Testing of a binomial probability Sample form the Normal distribution with unknown mean and unknown variance (Testing ) t z n x 0 H0 0 n x 0 s pˆ p0 p0 (1 p0 ) n n 1s 2 U 02 p = p0 0 HA p ≠p0 p >p0 p0 p < 0 Critical Region z < -z/2 or z > z/2 z > z z <-z t < -t/2 or t > t/2 t > t t < -t z < -z/2 or z > z/2 z > z z < -z U 12 / 2 n 1 or 0 U 2 n 1 0 U 12 n 1 U 2 / 2 n 1 Two Sample Tests Situation Two independent samples from the Normal distribution with unknown means and known variances (Testing 1 - 2) Test Statistic x1 x2 z 12 n1 H0 HA Critical Region 1 2 1 2 z < -z/2 or z > z/2 22 1 2 z > z n2 1 2 z < -z Two independent samples from the Normal distribution with unknown means and unknown but equal variances. (Testing 1 - 2) t x1 x2 sp 1 2 1 2 t < -t/2 or t > t/2 1 1 n1 n2 1 2 t > t 1 2 t < -t Estimation of a the difference between two binomial probabilities, p1-p2 z ˆ1 ˆ2 1 1 n 1 n2 ˆ (1 ˆ ) I am using instead of p. 1 2 1 2 z < -z/2 or z > z/2 1 2 z > z 1 2 z < -z