Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confounding • Epidemiology relies on observational studies or experiments of nature • Often these are poor experiments — no control for confounding by extraneous influences • Definition: A confounder is a variable whose influence we would have controlled if we had been able to design the natural experiment. 2 Example: confounding by age, Fig. 14.1 Age 0.8 ! ! ❅ ! ❅ ! ! ✟ <55 ✟ ❍❍ ❍❍ 0.9 ❅ 0.2❅ ❅ 0.1 ✟✟ 0.3 ✟ 55+ ❍ ✟ ✟✟ ❍ ❍❍ 0.7 Age ✟F ❍S ✟F 0.4 ! ! ! ❅ ❅ ! ! ✟ <55 ✟ ❍❍ ❍❍ 0.9 ❅ 0.3 55+ ✟ ❍ ✟✟ ❍ ✟ ❍❍ 0.7 Unexposed subjects ✟ F • Probability of failure for unexposed: (0.8 × 0.1) + (0.2 × 0.3) = 0.14 ❍ S • Probability of failure for exposed: (0.4 × 0.1) + (0.6 × 0.3) = 0.22 0.6❅ ❅ ❍S 0.1 ✟✟ ✟ F ❍ S • Difference entirely due to difference in age structure. • When there is a true effect, its magnitude can be distorted by such influences. Exposed subjects 3 4 Confounding when RR = 2 Age 0.8 ! ! ❅ ! ❅ ! ✟ <55 ❍ ! ✟ ❍ ✟ ✟ ❍❍ 0.9 ❅ 0.2❅ ❅ 0.1 0.2 ✟ 55+ ❍ ✟ ✟✟ ❍ ❍❍ 0.8 Age ✟F ❍S ✟F 0.4 ! ! ! ❅ ❅ ! <55 ✟ ❍ ! 0.2 ✟ ❍ ✟ ✟ ❍❍ 0.8 • The true relative risk, RRT = 0.2/0.1 = 0.4/0.2 = 2 • Probability of failure for unexposed: ❍ S (0.8 × 0.1) + (0.2 × 0.2) = 0.12 • Probability of failure for exposed: ❅ 0.6❅ ❅ 0.4 55+ ✟ ❍ ✟✟ ❍ ❍S ✟ ❍❍ 0.6 Unexposed subjects Results. ✟ F (0.4 × 0.2) + (0.6 × 0.4) = 0.32 ✟ F • The apparent relative risk: ❍ S RRO = 0.32/0.12 = 2.67 Exposed subjects 5 6 Confounding: schematically. A variable C is a potential confounder for the relation: E→O Confounding if it is A confounder is: • associated with outcome: e.g., older persons have higher disease probability, • associated with the exposure: e.g., older persons are more / less likely to be exposed, • not a result of exposure, i.e. not an intermediate variable. Not a statistical property; cannot be seen from tables; common sense is required! • 1) related to the exposure: E−C • 2) an independent risk factor for the outcome: C→O • 3) not a consequence of the exposure: E→C→O That is: E − ց O 7 C ւ 8 Confounding. The problem is that we do not always get a fair comparison between exposed and non-exposed. EXPOSED Young NON-EXPOSED Young Controlling confounding, Sect. 14.2 In controlled experiments there are two ways of controlling confounding: 1. Randomization of subjects to experimental groups so that the distributions of the confounder are the same. 2. Hold the confounder constant. Old Old A randomly selected exposed person tends to be older than a randomly chosen non-exposed. 9 Standardization is a classical statistical technique for controlling for extraneous variables (in particular: age) in the analysis of an observational study 1. Direct standardization simulates randomization by equalizing the distribution of extraneous variables. 2. Indirect standardization simulates the second method: holding extraneous variables constant. We first discuss direct standardization and then later turn to the main ways of “holding the confounder constant”: • stratified (“Mantel-Haenszel”) analysis 10 Direct standardization, sect. 14.3 1. Estimate age-specific rates (or risks) in each group, 2. Calculate marginal rates (risks) if the age distribution were fixed to that of some agreed standard population. A standard population is another term for a common age-distribution. 3. Direct standardization is good for illustrative purposes as it provides absolute rates. • or (more importantly) regression analysis: logistic, Poisson, Cox. 11 12 Age 0.8 ! ! ❅ ! ❅ ! ✟ <55 ❍ ! ✟ ❍ ❍ ❍ 0.9 ❅ 0.2❅ ❅ 0.1 ✟ ✟ 0.3 ✟ ✟✟ ✟ 55+ ❍ ❍ ❍S ✟F ❍ ❍❍ 0.7 Age ✟F 0.4 ! ! ! ❅ ❅ ! 0.1 <55 ✟ ❍ ! ✟ ❍ ❍ ✟ ❍ 0.9 ❅ 0.6❅ ❅ ✟ 0.3 55+ ✟ ❍ ✟ ❍ ✟✟ ❍ ✟ F ❍ S ✟ F ❍❍ 0.7 S Unexposed subjects The Diet data Current age S Exposed subjects D Exposed Unexposed (< 2750 kcal) (≥ 2750 kcal) Y Rate D Y Rate RR 40–49 2 311.9 6.41 4 607.9 6.58 0.97 50–59 12 878.1 13.67 5 1271.1 3.93 3.48 60–69 14 667.5 20.97 8 888.9 9.00 2.33 Total 28 1857.5 15.07 17 2768.9 6.14 2.46 Marginal failure probability (with 50-50 age distribution) is (0.5 × 0.1) + (0.5 × 0.3) = 0.2 for both groups 13 14 Direct standardization in the diet data. We can standardize the age-specific rates to a population with equal numbers of person–years in each age group. Exposed: 1 × 6.41 + 3 Unexposed: 1 × 13.67 + 3 1 × 20.97 3 = 13.67 1 1 1 × 6.58 + × 3.93 + × 9.00 = 6.50 3 3 3 Choice of weights • Sometimes overall age structure of the whole study is used • Use of a standard age structure can facilitate comparison with other work. • In cancer epidemiology standard populations approximating the European, US or World population age-distribution are used. • Equal weights essentially give a comparison between cumulative rates in the two groups Estimate of rate ratio is 13.67/6.50 = 2.10. 15 16 If the effect of exposure is the same in all age-strata, we can re-parameterize rates as: Stratified (Mantel-Haenszel) analysis, Ch. 15. • Aim is to hold age constant. • Compare exposed and unexposed persons within age strata. • Compute a combined estimate of effect over all strata. • This implies a model in which there is no (systematic) variation of effect over strata. • If estimates are similar we combine them, by a suitable average. Exposed Unexposed Low energy High energy Rate Ratio 40–49 λ01 = θλ00 λ00 θ 50–59 λ11 = θλ10 λ10 θ 60–69 λ21 = θλ20 λ20 θ Age This is the proportional hazards model: For every stratum a: λa1 = θλa0 . θ is the effect of exposure “controlled for” age. 17 18 The Mantel-Haenszel estimate Data Exposed Unexposed Low energy (1) High energy (0) 40–49 (a = 0) D10 , Y10 D00 , Y00 50–59 (a = 1) D11 , Y11 D01 , Y01 60–69 (a = 2) D12 , Y12 D02 , Y02 Age (a) The MH-estimate for θ is (the weighted average): P D1a Y0a P Qa Q a Y0a +Y1a θMH = P D0a Y1a = Pa = . R R a a a Y +Y 0a 1a This may be calculated by hand. Note that only θ is estimated, not the λ’s. Maximum likelihood estimation of all parameters: later. 19 20 The Mantel-Haenszel test The Mantel-Haenszel test for no exposure effect is: An approximate confidence interval for θ can be obtained using a standard error for log(θ̂) and then calculate the error factor in the usual way: s V sd(log(θMH )) = QR where V = X Va = a X a U 2 /V where U= X Ua a and Ua = D1a − (D0a + D1a ) Y0a Y1a (D0a + D1a ) . (Y0a + Y1a )2 Y1a Y0a + Y1a (NB: calculations by hand). This test may also be based on the likelihood principle. When θ = 1, this is approximately χ21 −distributed. 21 22 Is it reasonable to assume constant rate ratio? Estimate θ and compute the expected number of unexposed cases given the total number of cases and the split of risk time between exposed and unexposed: E0a Y0a = (D0a + D1a ) Y0a + θMH Y1a (cases should occur in proportion Y0a : θMH Y1a ). Then, compute the “Breslow-Day” test statistic for homogeneity over strata: A X (D0a − E0a )2 a=1 E0a ∼ χ2A−1 , The diet data. • θM H = 2.40, • 90% c.i. from 1.44 to 4.01, • MH-test statistic: 8.48 ∼ χ21 , P = 0.004, • Breslow-Day test statistic: 1.65 ∼ χ22 , P = 0.44. (where A is the number of age strata). If this is sufficiently small, accept that the rate ratio is constant. 23 24 Fixed follow-up time. If all cohort members are followed for the same time (say, from t0 to t1 ) then data from stratum a may be summarized in a (2 × 2)−table: Group F(ailure) S(urvival) Total Non-exp. D0a n0a − D0a n0a Exposed. D1a n1a − D1a n1a Cohorts where all are exposed: indirect standardization. C & H: Sect. 15.6. When there is no comparison group we may ask: Do mortality rates in cohort differ from those of an external population, for example: • Occupational cohorts • Patient cohorts M-H estimate and M-H test for an assumed common risk ratio may be obtained as for the rates replacing Y0a by n0a and Y1a by n1a . compared with reference rates obtained from: • Population statistics (mortality rates) • Disease registers (hospital discharge registers) M-H analysis of OR may also be performed. 25 26 Accounting for age composition • Compare rates in a study group with a standard set of age–specific rates • Reference rates are normally based on large numbers of cases, so they can be assumed to be known • If we use the Mantel-Haenszel estimator when D0a is large, Y0a is large, then θMH = SM R = D/E a a λ0 Y1a , if the • Calculate “expected” number of cases, E = standard rates had applied in our study group, and compare this P with the observed number of cases, D = a D1a : p • Similarly, sd(log[SM R]) = 1/D 27 974 women treated with hormone replacement therapy were followed up. In this cohort 15 incident cases of breast cancer were observed. The woman–years of observation and corresponding E & W rates were: person- E & W rate years per 100 000 py E 40–44 975 113 1.10 45–49 1079 162 1.75 50–54 2161 151 3.26 55–59 2793 183 5.11 60–64 P 3096 179 5.54 Age D0a = λa0 Y0a P Example: C & H, p.56. 16.77 28 • “Expected” cases at ages 40–44: 975 × 113 = 1.10 100 000 • Total “expected” cases is E = 16.77 • The SMR is 15/16.77 = 0.89, or 89%. p • Error-factor: exp(1.645 × 1/15) = 1.53 • 90% confidence interval is: 0.89 × / ÷ 1.53 = (0.58, 1.36) 29