Download Confounding Example: confounding by age, Fig. 14.1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Confounding
• Epidemiology relies on observational studies or experiments of
nature
• Often these are poor experiments
— no control for confounding by extraneous influences
• Definition:
A confounder is a variable whose influence we would have
controlled if we had been able to design the natural
experiment.
2
Example: confounding by age, Fig. 14.1
Age
0.8 !
!
❅
!
❅
!
!
✟
<55 ✟
❍❍
❍❍
0.9
❅
0.2❅
❅
0.1
✟✟
0.3
✟
55+ ❍
✟
✟✟
❍
❍❍
0.7
Age
✟F
❍S
✟F
0.4 !
!
!
❅
❅
!
!
✟
<55 ✟
❍❍
❍❍
0.9
❅
0.3
55+ ✟
❍
✟✟
❍
✟
❍❍
0.7
Unexposed subjects
✟ F
• Probability of failure for unexposed:
(0.8 × 0.1) + (0.2 × 0.3) = 0.14
❍ S
• Probability of failure for exposed:
(0.4 × 0.1) + (0.6 × 0.3) = 0.22
0.6❅
❅
❍S
0.1
✟✟
✟ F
❍ S
• Difference entirely due to difference in age structure.
• When there is a true effect, its magnitude can be distorted by
such influences.
Exposed subjects
3
4
Confounding when RR = 2
Age
0.8 !
!
❅
!
❅
!
✟
<55 ❍
!
✟
❍
✟
✟
❍❍
0.9
❅
0.2❅
❅
0.1
0.2
✟
55+ ❍
✟
✟✟
❍
❍❍
0.8
Age
✟F
❍S
✟F
0.4 !
!
!
❅
❅
!
<55 ✟
❍
!
0.2
✟
❍
✟
✟
❍❍
0.8
• The true relative risk, RRT = 0.2/0.1 = 0.4/0.2 = 2
• Probability of failure for unexposed:
❍ S
(0.8 × 0.1) + (0.2 × 0.2) = 0.12
• Probability of failure for exposed:
❅
0.6❅
❅
0.4
55+ ✟
❍
✟✟
❍
❍S
✟
❍❍
0.6
Unexposed subjects
Results.
✟ F
(0.4 × 0.2) + (0.6 × 0.4) = 0.32
✟ F
• The apparent relative risk:
❍ S
RRO = 0.32/0.12 = 2.67
Exposed subjects
5
6
Confounding: schematically.
A variable C is a potential confounder for the relation:
E→O
Confounding
if it is
A confounder is:
• associated with outcome:
e.g., older persons have higher disease probability,
• associated with the exposure:
e.g., older persons are more / less likely to be exposed,
• not a result of exposure, i.e. not an intermediate variable.
Not a statistical property; cannot be seen from tables; common
sense is required!
• 1) related to the exposure:
E−C
• 2) an independent risk factor for the outcome:
C→O
• 3) not a consequence of the exposure:
E→C→O
That is:
E
−
ց
O
7
C
ւ
8
Confounding.
The problem is that we do not always get a fair comparison between
exposed and non-exposed.
EXPOSED
Young
NON-EXPOSED
Young
Controlling confounding, Sect. 14.2
In controlled experiments there are two ways of controlling
confounding:
1. Randomization of subjects to experimental groups so that the
distributions of the confounder are the same.
2. Hold the confounder constant.
Old
Old
A randomly selected exposed person tends to be older than a
randomly chosen non-exposed.
9
Standardization is a classical statistical technique for controlling
for extraneous variables (in particular: age) in the analysis of an
observational study
1. Direct standardization simulates randomization by equalizing
the distribution of extraneous variables.
2. Indirect standardization simulates the second method: holding
extraneous variables constant.
We first discuss direct standardization and then later turn to the
main ways of “holding the confounder constant”:
• stratified (“Mantel-Haenszel”) analysis
10
Direct standardization, sect. 14.3
1. Estimate age-specific rates (or risks) in each group,
2. Calculate marginal rates (risks) if the age distribution were fixed
to that of some agreed standard population.
A standard population is another term for a common
age-distribution.
3. Direct standardization is good for illustrative purposes as it
provides absolute rates.
• or (more importantly) regression analysis: logistic, Poisson, Cox.
11
12
Age
0.8 !
!
❅
!
❅
!
✟
<55 ❍
!
✟
❍
❍
❍
0.9
❅
0.2❅
❅
0.1
✟
✟
0.3
✟
✟✟
✟
55+ ❍
❍
❍S
✟F
❍
❍❍
0.7
Age
✟F
0.4 !
!
!
❅
❅
!
0.1
<55 ✟
❍
!
✟
❍
❍
✟
❍
0.9
❅
0.6❅
❅
✟
0.3
55+ ✟
❍
✟
❍
✟✟
❍
✟ F
❍ S
✟ F
❍❍
0.7
S
Unexposed subjects
The Diet data
Current
age
S
Exposed subjects
D
Exposed
Unexposed
(< 2750 kcal)
(≥ 2750 kcal)
Y
Rate
D
Y
Rate
RR
40–49
2
311.9
6.41
4
607.9
6.58
0.97
50–59
12
878.1
13.67
5
1271.1
3.93
3.48
60–69
14
667.5
20.97
8
888.9
9.00
2.33
Total
28
1857.5
15.07
17
2768.9
6.14
2.46
Marginal failure probability (with 50-50 age distribution) is
(0.5 × 0.1) + (0.5 × 0.3) = 0.2 for both groups
13
14
Direct standardization in the diet data.
We can standardize the age-specific rates to a population with equal
numbers of person–years in each age group.
Exposed:
1
× 6.41 +
3
Unexposed:
1
× 13.67 +
3
1
× 20.97
3
= 13.67
1
1
1
× 6.58 +
× 3.93 +
× 9.00 = 6.50
3
3
3
Choice of weights
• Sometimes overall age structure of the whole study is used
• Use of a standard age structure can facilitate comparison with
other work.
• In cancer epidemiology standard populations approximating the
European, US or World population age-distribution are used.
• Equal weights essentially give a comparison between cumulative
rates in the two groups
Estimate of rate ratio is 13.67/6.50 = 2.10.
15
16
If the effect of exposure is the same in all age-strata, we can
re-parameterize rates as:
Stratified (Mantel-Haenszel) analysis, Ch. 15.
• Aim is to hold age constant.
• Compare exposed and unexposed persons within age strata.
• Compute a combined estimate of effect over all strata.
• This implies a model in which there is no (systematic) variation
of effect over strata.
• If estimates are similar we combine them, by a suitable average.
Exposed
Unexposed
Low energy
High energy
Rate Ratio
40–49
λ01 = θλ00
λ00
θ
50–59
λ11
=
θλ10
λ10
θ
60–69
λ21
=
θλ20
λ20
θ
Age
This is the proportional hazards model:
For every stratum a: λa1 = θλa0 .
θ is the effect of exposure “controlled for” age.
17
18
The Mantel-Haenszel estimate
Data
Exposed
Unexposed
Low energy (1)
High energy (0)
40–49 (a = 0)
D10 , Y10
D00 , Y00
50–59 (a = 1)
D11 , Y11
D01 , Y01
60–69 (a = 2)
D12 , Y12
D02 , Y02
Age (a)
The MH-estimate for θ is (the weighted average):
P D1a Y0a
P
Qa
Q
a Y0a +Y1a
θMH = P D0a Y1a = Pa
= .
R
R
a a
a Y +Y
0a
1a
This may be calculated by hand.
Note that only θ is estimated, not the λ’s.
Maximum likelihood estimation of all parameters: later.
19
20
The Mantel-Haenszel test
The Mantel-Haenszel test for no exposure effect is:
An approximate confidence interval for θ can be obtained using a
standard error for log(θ̂) and then calculate the error factor in the
usual way:
s
V
sd(log(θMH )) =
QR
where
V =
X
Va =
a
X
a
U 2 /V
where
U=
X
Ua
a
and
Ua = D1a − (D0a + D1a )
Y0a Y1a
(D0a + D1a )
.
(Y0a + Y1a )2
Y1a
Y0a + Y1a
(NB: calculations by hand). This test may also be based on the
likelihood principle.
When θ = 1, this is approximately χ21 −distributed.
21
22
Is it reasonable to assume constant rate ratio?
Estimate θ and compute the expected number of unexposed cases
given the total number of cases and the split of risk time between
exposed and unexposed:
E0a
Y0a
= (D0a + D1a )
Y0a + θMH Y1a
(cases should occur in proportion Y0a : θMH Y1a ). Then, compute the
“Breslow-Day” test statistic for homogeneity over strata:
A
X
(D0a − E0a )2
a=1
E0a
∼ χ2A−1 ,
The diet data.
• θM H = 2.40,
• 90% c.i. from 1.44 to 4.01,
• MH-test statistic: 8.48 ∼ χ21 , P = 0.004,
• Breslow-Day test statistic: 1.65 ∼ χ22 , P = 0.44.
(where A is the number of age strata). If this is sufficiently small,
accept that the rate ratio is constant.
23
24
Fixed follow-up time.
If all cohort members are followed for the same time (say, from t0 to
t1 ) then data from stratum a may be summarized in a (2 × 2)−table:
Group
F(ailure)
S(urvival)
Total
Non-exp.
D0a
n0a − D0a
n0a
Exposed.
D1a
n1a − D1a
n1a
Cohorts where all are exposed: indirect standardization.
C & H: Sect. 15.6.
When there is no comparison group we may ask:
Do mortality rates in cohort differ from those of an external
population, for example:
• Occupational cohorts
• Patient cohorts
M-H estimate and M-H test for an assumed common risk ratio may
be obtained as for the rates replacing Y0a by n0a and Y1a by n1a .
compared with reference rates obtained from:
• Population statistics (mortality rates)
• Disease registers (hospital discharge registers)
M-H analysis of OR may also be performed.
25
26
Accounting for age composition
• Compare rates in a study group with a standard set of
age–specific rates
• Reference rates are normally based on large numbers of cases, so
they can be assumed to be known
• If we use the Mantel-Haenszel estimator when
D0a is large,
Y0a is large,
then θMH = SM R = D/E
a
a λ0 Y1a ,
if the
• Calculate “expected” number of cases, E =
standard rates had applied in our study group, and compare this
P
with the observed number of cases, D = a D1a :
p
• Similarly, sd(log[SM R]) = 1/D
27
974 women treated with hormone replacement therapy were followed up.
In this cohort 15 incident cases of breast cancer were observed. The
woman–years of observation and corresponding E & W rates were:
person-
E & W rate
years
per 100 000 py
E
40–44
975
113
1.10
45–49
1079
162
1.75
50–54
2161
151
3.26
55–59
2793
183
5.11
60–64
P
3096
179
5.54
Age
D0a
= λa0
Y0a
P
Example: C & H, p.56.
16.77
28
• “Expected” cases at ages 40–44:
975 ×
113
= 1.10
100 000
• Total “expected” cases is E = 16.77
• The SMR is 15/16.77 = 0.89, or 89%.
p
• Error-factor: exp(1.645 × 1/15) = 1.53
• 90% confidence interval is:
0.89 × / ÷ 1.53 = (0.58, 1.36)
29