Download Lecture 19

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Comparing Populations
Proportions and means
The sampling distribution of differences of Normal
Random Variables
If X and Y denote two independent normal random
variables, then :
D = X – Y is normal with
mean D   X  Y
standard deviation  D   X2   Y2
Comparing proportions
Situation
• We have two populations (1 and 2)
• Let p1 denote the probability (proportion) of
“success” in population 1.
• Let p2 denote the probability (proportion) of
“success” in population 2.
• Objective is to compare the two population
proportions
Consider the statistic:
x1 x2
D  pˆ1  pˆ 2 = n1 n2
This statistic has a normal distribution
D   pˆ   pˆ  p1  p2
1
z 
pˆ1  pˆ 2
 pˆ  pˆ
1
2

2
pˆ1  pˆ 2
pˆ1 1  pˆ1  pˆ 2 1  pˆ 2 

n1
n1
Consider the statistic:
x1 x2
D  pˆ1  pˆ 2 = n1 n2
This statistic has a normal distribution with
D   pˆ   pˆ  p1  p2
1
2
 D = pˆ  pˆ   p2ˆ   p2ˆ
1
2
1
2

p1 1  p1  p2 1  p2 

n1
n1

pˆ1 1  pˆ1  pˆ 2 1  pˆ 2 

n1
n2
Thus
z 
D  D
D

pˆ1  pˆ 2 -  p1  p2 
 pˆ  pˆ
1


2
pˆ1  pˆ 2 -  p1  p2 
p1 1  p1  p2 1  p2 

n1
n1
pˆ1  pˆ 2 -  p1  p2 
pˆ1 1  pˆ1  pˆ 2 1  pˆ 2 

n1
n1
Has a standard normal distribution
We want to test either:
1. H 0 : p1  p2 vs H A : p1  p2
or
2. H 0 : p1  p2 vs H A : p1  p2
or
3. H 0 : p1  p2 vs H A : p1  p2
If p1 = p2 (p say) then the test statistic:
z 
D  D
D

pˆ1  pˆ 2 -  p1  p2 
 pˆ  pˆ
1


2
pˆ1  pˆ 2 -  p1  p2 
p1 1  p1  p2 1  p2 

n1
n2
pˆ1  pˆ 2
1 1
p 1  p    
 n1 n2 

pˆ1  pˆ 2
1 1
pˆ 1  pˆ    
 n1 n2 
has a standard normal distribution.
where
x1  x2
pˆ 
n1  n2
is an estimate of the common
value of p1 and p2.
Thus for comparing two binomial probabilities
p1 and p2
The test statistic
z
pˆ1  pˆ 2
1 1
pˆ 1  pˆ    
 n1 n2 
where
x1
x2
pˆ1 
, pˆ 2 
n1
n2
x1  x2
and pˆ 
n1  n2
The Critical Region
The Alternative
Hypothesis HA
The Critical Region
H A : p1  p2
z   z / 2 or z  z / 2
H A : p1  p2
z  z
H A : p1  p2
z   z
Example
• In a national study to determine if there was an
increase in mortality due to pipe smoking, a
random sample of n1 = 1067 male nonsmoking
pensioners were observed for a five-year period.
• In addition a sample of n2 = 402 male pensioners
who had smoked a pipe for more than six years
were observed for the same five-year period.
• At the end of the five-year period, x1 = 117 of the
nonsmoking pensioners had died while x2 = 54 of
the pipe-smoking pensioners had died.
• Is there a the mortality rate for pipe smokers
higher than that for non-smokers
We want to test:
H 0 : p1  p2 vs H A : p1  p2
The test statistic:
z
pˆ1  pˆ 2
 pˆ  pˆ
1
2

pˆ1  pˆ 2
1 1
pˆ 1  pˆ   
 n1 n2 
Note:
x1
117
pˆ1 

 0.1097
n1 1067
x2
54
pˆ 2 

 0.1343
n2 402
x1  x2
117  54
pˆ 

n1  n2 1067  402
171

 0.1164
1469
The test statistic:
z 

pˆ1  pˆ 2
1 1
pˆ 1  pˆ   
 n1 n2 
0.1097  .1343
1 
 1
0.11641  0.1164 


 1067 402 
 1.315
We reject H0 if:
z   z  -z0.05  1.645
Not true hence we accept H0.
Conclusion: There is not a significant ( =
0.05) increase in the mortality rate due to
pipe-smoking
Estimating a difference proportions using
confidence intervals
Situation
• We have two populations (1 and 2)
• Let p1 denote the probability (proportion) of
“success” in population 1.
• Let p2 denote the probability (proportion) of
“success” in population 2.
• Objective is to estimate the difference in the
two population proportions d = p1 – p2.
Confidence Interval for d
100P% = 100(1 – ) % :
= p1 – p2
pˆ1  pˆ 2  z / 2  pˆ1  pˆ 2
pˆ1  pˆ 2  z / 2
pˆ1 1  pˆ1  pˆ 2 1  pˆ 2 

n1
n2
Example
• Estimating the increase in the mortality rate
for pipe smokers higher over that for nonsmokers d = p2 – p1
pˆ1 1  pˆ1  pˆ 2 1  pˆ 2 
pˆ 2  pˆ1  z / 2

n1
n2
0.10971  0.1097 0.13431  0.1343
0.1343  0.1097  1.960

1067
0.0247  0.0382
 0.0136 to 0.0629
 1.36% to 6.29%
402
Comparing Means
Situation
• We have two normal populations (1 and 2)
• Let 1 and 1 denote the mean and standard
deviation of population 1.
• Let 2 and 2 denote the mean and standard
deviation of population 1.
• Let x1, x2, x3 , … , xn denote a sample from a
normal population 1.
• Let y1, y2, y3 , … , ym denote a sample from a
normal population 2.
• Objective is to compare the two population means
We want to test either:
1. H 0 : 1  2 vs H A : 1  2
or
2. H 0 : 1  2 vs H A : 1  2
or
3. H 0 : 1  2 vs H A : 1  2
Consider the test statistic:
z
xy

 xy
xy

 
2
x
xy

2
1
n


2
2
m

2
y
xy
2
x
2
y
s
s

n m
H 0 : 1  2 is true
If:
z
xy

2
1
n


2
2
m

xy
2
x
2
y
s
s

n m
• will have a standard Normal distribution
• This will also be true for the approximation
(obtained by replacing 1 by sx and 2 by sy) if
the sample sizes n and m are large (greater than
30)
Note:
n
n
x
x
i 1
i
n
sx 
y
i 1
m
i
i 1
i
n 1
n
n
y
 x  x 
2
sy 
2


y

y
 i
i 1
m 1
The Alternative
Hypothesis HA
The Critical Region
H A : 1  2
z   z / 2 or z  z / 2
H A : 1  2
z  z
H A : 1  2
z   z
Example
• A study was interested in determining if an
exercise program had some effect on reduction of
Blood Pressure in subjects with abnormally high
blood pressure.
• For this purpose a sample of n = 500 patients with
abnormally high blood pressure were required to
adhere to the exercise regime.
• A second sample m = 400 of patients with
abnormally high blood pressure were not required
to adhere to the exercise regime.
• After a period of one year the reduction in blood
pressure was measured for each patient in the
study.
We want to test:
H 0 : 1  2
The exercise group did not have a higher
average reduction in blood pressure
vs
H A : 1  2
The exercise group did have a higher
average reduction in blood pressure
The test statistic:
z
xy

 xy
xy

 
2
x
xy

2
1
n


2
2
m

2
y
xy
2
x
2
y
s
s

n m
Suppose the data has been collected and:
n
n
x
x
i 1
i
n
 10.67
sx 
 x  x 
y
i 1
m
i
i 1
n 1
n
n
 yi
2
 7.83
sy 
y
i 1
i
 3.895
 y
m 1
2
 4.224
The test statistic:
z
xy
2
x
2
y
s
s

n m

10.67  7.83
3.895
2
500

4.224 

2.84

 10.4
0.273765
2
400
We reject H0 if:
z  z  z0.05  1.645
True hence we reject H0.
Conclusion: There is a significant ( = 0.05)
effect due to the exercise regime on the
reduction in Blood pressure
Estimating a difference means using
confidence intervals
Situation
• We have two populations (1 and 2)
• Let 1 denote the mean of population 1.
• Let 2 denote the mean of population 2.
• Objective is to estimate the difference in the
two population proportions d = 1 – 2.
Confidence Interval for
d = 1 – 2
ˆ1  ˆ 2  z / 2  ˆ ˆ
1
x  y  z / 2
2
x
2
2
y
s
s

n m
Example
• Estimating the increase in the average
reduction in Blood pressure due to the
excercize regime d = 1 – 2
x  y  z / 2
2
x
2
y
s
s

n m

3.895
10.67  7.83  1.960
2
500
2.84  1.96(.273765)
2.84  0.537
2.303 to 3.337

4.224 

2
400
Comparing Means – small samples
Situation
• We have two normal populations (1 and 2)
• Let 1 and 1 denote the mean and standard
deviation of population 1.
• Let 2 and 2 denote the mean and standard
deviation of population 1.
• Let x1, x2, x3 , … , xn denote a sample from a
normal population 1.
• Let y1, y2, y3 , … , ym denote a sample from a
normal population 2.
• Objective is to compare the two population means
We want to test either:
1. H 0 : 1  2 vs H A : 1  2
or
2. H 0 : 1  2 vs H A : 1  2
or
3. H 0 : 1  2 vs H A : 1  2
Consider the test statistic:
z

xy
 xy
xy

 
2
x
xy

2
1
n


2
2
m

2
y
xy
2
x
2
y
s
s

n m
If the sample sizes (m and n) are large the
statistic
t
xy
2
x
2
y
s
s

n m
will have approximately a standard
normal distribution
This will not be the case if sample
sizes (m and n) are small
The t test – for comparing means –
small samples (equal variances)
Situation
• We have two normal populations (1 and 2)
• Let 1 and  denote the mean and standard
deviation of population 1.
• Let 2 and  denote the mean and standard
deviation of population 1.
• Note: we assume that the standard deviation
for each population is the same.
1 = 2 = 
Let
n
n
x
x
i 1
i
n
sx 
y
i 1
m
i
i 1
i
n 1
n
n
y
 x  x 
2
sy 
2


y

y
 i
i 1
m 1
The pooled estimate of .
Note: both sx and sy are estimators of .
These can be combined to form a single
estimator of , sPooled.
sPooled 
n  1sx2  m  1s 2y
nm2
The test statistic:
xy
t
s
2
Pooled
n

s
2
Pooled
m
xy

1 1
sPooled

n m
If 1 = 2 this statistic has a t distribution
with n + m –2 degrees of freedom
The Alternative
Hypothesis HA
The Critical Region
H A : 1  2
t  t / 2 or t  t / 2
H A : 1  2
t  t
H A : 1  2
t  t
t / 2 and t
are critical points under the t distribution with
degrees of freedom n + m –2.
Example
• A study was interested in determining if
administration of a drug reduces cancerous
tumor size.
• For this purpose n +m = 9 test animals are
implanted with a cancerous tumor.
• n = 3 are selected at random and
administered the drug.
• The remaining m = 6 are left untreated.
• Final tumour sizes are measured at the end
of the test period
We want to test:
H 0 : 1  2
The treated group did not have a lower
average final tumour size.
vs
H A : 1  2
The exercise group did have a lower
average final tumour size.
The test statistic:
xy
t
1 1
sPooled

n m
Suppose the data has been collected and:
drug treated
untreated
1.89
2.08
1.79
1.28
1.29
1.75
n
x
 xi
n
 1.657
i 1
n
sx 
n
y
y
i 1
m
1.90
i
2.32
 x  x 
i 1
2.16
2
i
n 1
 0.3215
n
 1.915
sy 
2


y

y
 i
i 1
m 1
 0.3693
The test statistic:
sPooled 
n  1sx2  m  1s 2y
nm2
20.3215  50.3693
 0.3563
7
2

2
1.657  1.915
 .258
t

 1.025
.252
1 1
0.3563 
3 6
We reject H0 if:
t  t   t0.05  1.895
with d.f. = n + m – 2 = 7
Hence we accept H0.
Conclusion: The drug treatment does not
result in a significant ( = 0.05) smaller final
tumour size,
Summary of Tests
One Sample Tests
Situation
Test Statistic
Sample form the Normal
distribution with unknown
mean and known variance
(Testing )
z
Sample form the Normal
distribution with unknown
mean and unknown variance
(Testing )
Testing of a binomial
probability 
Sample form the Normal
distribution with unknown
mean and unknown variance
(Testing )
t
z
n x  0 
H0
  
0
n x   0 
s
pˆ  p0
p0 (1  p0 )
n

n  1s 2
U
 02
  
 
p = p0
 0
HA
  
  
  
  
  
  
p ≠p0 
p >p0
 p0
p <
  0
Critical Region
z < -z/2 or z > z/2
z > z
z <-z
t < -t/2 or t > t/2
t > t
t < -t
z < -z/2 or z > z/2
z > z
z < -z
U   12 / 2 n  1 or
  0
U   2 n  1
  0
U   12 n  1
U   2 / 2 n  1
Two Sample Tests
Situation
Two independent samples
from the Normal distribution
with unknown means and
known variances
(Testing 1 - 2)
Test Statistic
 x1  x2 
z
 12
n1

H0
HA
Critical Region
1   2 1   2 z < -z/2 or z > z/2
 22
1   2 z > z
n2
1   2 z < -z
Two independent samples
from the Normal distribution
with unknown means and
unknown but equal
variances.
(Testing 1 - 2)
t
x1  x2 
sp
1   2 1   2 t < -t/2 or t > t/2
1 1

n1 n2
1   2 t > t
1   2 t < -t
Estimation of a the
difference between two
binomial probabilities, p1-p2
z
ˆ1  ˆ2
1 1
 
n
 1 n2 
ˆ (1  ˆ ) 
I am using  instead of p.
1   2
1   2 z < -z/2 or z > z/2
 1   2 z > z
1   2
z < -z
Related documents