Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Graphical Causal Models References
Causality in Econometrics (3)
Alessio Moneta
Max Planck Institute of Economics
Jena
moneta@econ.mpg.de
26 April 2011
GSBC Lecture
Friedrich-Schiller-Universität Jena
Causality in Econometrics
1/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Graphical Causal Models
Terminology and Representation of Statistical
Dependence
Causality in Econometrics
2/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Sources and Motivations
B The graphical-models approach to causal inference was mainly
developed by:
• Spirtes, Glymour, Scheines (2000), Causation, Prediction, and Search,
2nd edition.
• Pearl (2000), Causality: Models, Reasoning, and Inference.
B Forerunners:
• J.S. Mill
• C. Spearman
• T. Haavelmo, H. Wold, H. Simon
• H. Reichenbach, P. Suppes
Causality in Econometrics
3/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Sources and Motivations
B Ideas:
• Use of probability + diagrams to represent associations in the data
• Use of graph-theory to represent and analyze causal relations
• This permits, in particular:
• addressing the symmetry problem, typical of probabilistic approaches
• representation of structures where interventions are possible
• Formalization of the relationship between probabilistic and causal
representation
• Emphasis on inference, agnosticism about causal ontology. But:
many points of contact with
• probabilistic approach (Reichenbach)
• manipulability theory (Woodward).
Causality in Econometrics
4/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Formal preliminaries
B Graph: < V, M, E >
• set V of vertices (or nodes) to represent variables.
• set M of marks as ‘>’, ‘−’ (or EM ≡ empty mark), ‘o’, to represent
directions of causal influences.
• set E of edges, which are pairs of the form {[V1 , M1 ], [V2 , M2 ]}, to
represent causal relationships.
V1
- V
2
V3
G: < {V1 , V2 , V3 }, {EM, >}, {{[V1 , EM], [V2 , >]}, {[V1 , EM], [V3 , EM]}, {[V3 , EM], [V2 , >]}} >
Causality in Econometrics
5/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Formal preliminaries
B Undirected graph:
• graph in which the set of marks M = {EM}
B Directed graph:
• graph in which the set of marks M = {EM, >} and for each edge in
E the marks are are always: EM, >
B Directed edges: A −→ B (≡ {[A, EM], [B, >]})
• A : parent, B : child (descendant).
Causality in Econometrics
6/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Formal preliminaries
B Path:
• undirected path: a sequence of vertices A, . . . , B such that for every
pair of vertices X, Y adjacent (in the sequence) there is a connecting
edge {[X, M1 ][Y, M2 ]}.
• directed path: a sequence of vertices A, . . . , B such that for every
pair of vertices X, Y adjacent (in the sequence) there is a connecting
edge {[X, EM][Y, >]}.
• acyclic path: path that contains no vertex more than once, otherwise
it is cyclic.
Causality in Econometrics
7/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Example
V1
- V
2
-V
4
-V
5
V3
• Directed paths: < V1 , V2 , V4 , V5 >; < V3 , V2 , V4 , V5 >;
< V2 , V4 , V5 >, etc.
• Undirected paths: < V1 , V3 , V2 , V4 , V5 >; < V1 , V2 , V3 >, etc.
• Undirected cyclic path: < V1 , V2 , V3 , V1 >
• No directed cyclic paths.
Causality in Econometrics
8/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
More terminology
B Collider: vertex V such that A −→ V ←− B
B Unshielded collider: vertex V such that A −→ V ←− B and A
and B are not adjacent (≡ connected by edge) in the graph
B Complete graph: graph in which every pair of vertices are
adjacent
B Directed Acyclic Graph (DAG): directed graph that contains no
directed cyclic paths
B Directed Cyclic Graph (DCG): directed graph that contains
directed cyclic paths
Causality in Econometrics
9/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Graphs and probabilistic dependence
B First use of graphs: representation of probabilistic dependence
and independence
B Nodes: random variables (discrete or continuous).
B Edges: probabilistic dependence.
B Bayesian networks (Pearl 1985).
Causality in Econometrics
10/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Conditional Independence
B If X, Y, Z are random variables, we say that X is conditionally
independent of Y given Z, and write
X⊥
⊥ Y |Z
(1)
if
• for discrete variables:
P(X = x, Y = y|Z = z) = P(X = x|Z = z)P(Y = y|Z = z)
• for continuous variables:
fXY|Z (x, y|z) = fX|Z (x|z)fY|Z (y|z)
• We can also write (simplifying the notation):
X⊥
⊥ Y|Z ⇐⇒ f (x, y, z)f (z) = f (x, z)f (y, z)
Causality in Econometrics
11/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Conditional independence
B Some equalities:
• X⊥
⊥ Y|Z ⇐⇒ f (x, y|z) = f (x|z)f (y|z)
• X⊥
⊥ Y|Z ⇐⇒ f (x, y, z)f (z) = f (x, z)f (y, z)
• X⊥
⊥ Y|Z ⇐⇒ f (x|y, z) = f (x|z)
• X⊥
⊥ Y|Z ⇐⇒ f (x, z|y) = f (x|z)f (z|y)
• X⊥
⊥ Y|Z ⇐⇒ f (x, y, z) = f (x|z)f (y, z)
Note: f (x, y|z) = f (x, y, z)/f (z)
Causality in Econometrics
12/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Conditional independence
B It holds also:
• X⊥
⊥ Y|Z ⇐⇒ Y ⊥
⊥ X|Z (symmetry)
• If Z is empty (trivial) X ⊥
⊥ Y: X is independent of Y.
B Other properties:
• X⊥
⊥ YW |Z =⇒ X ⊥
⊥ Y|Z (decomposition)
• X⊥
⊥ YW |Z =⇒ X ⊥
⊥ Y|ZW (weak union)
See Pearl 2000:11
Causality in Econometrics
13/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Interpretations of C.I.
B Useful interpretations of C.I. X ⊥
⊥ Y|Z:
• once we know Z, learning the value of Y does not provide
additional information about X.
• once we know Z, reading X is irrelevant for reading Y.
• once we observe realizations of Z, observing realizations of Y is
irrelevant for predicting the frequent realizations of X.
Causality in Econometrics
14/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Independence and uncorrelatedness
B Important to distinguish between (conditional) independence
and (conditional or partial) correlation.
• Recall:
B Variance of X:
σX2 := E[(X − E(X))2 ]
B Covariance between X and Y:
σXY := E[(X − E(X))(Y − E(Y))]
B Correlation coefficient (Pearson):
σ
ρXY := XY
σX σY
B Linear regression coefficient:
σ
σ
= ρXY X
rXY := XY
σY
σY2
B This suggest that correlation is a measure of linear dependence
B Notice: σXY = σYX and ρXY = ρYX but rXY 6= rYX
Causality in Econometrics
15/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Independence and uncorrelatedness
• Recall:
B Partial correlation between X and Y given Z
ρXY.Z = q
ρXY − ρYZ ρXZ
q
1 − ρ2XZ
1 − ρ2YZ
B Conditional independence X ⊥
⊥ Y|Z:
fXY|Z (x, y|z) = fX|Z (x|z)fY|Z (y|z)
B It holds:
• X⊥
⊥ Y =⇒ ρXY = 0
• X⊥
⊥ Y|Z =⇒ ρXY.Z = 0
B and (of course):
• ρXY 6= 0 =⇒ X ⊥
⊥
/ Y
• ρXY.Z 6= 0 =⇒ X ⊥
⊥
/ Y |Z
Causality in Econometrics
16/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Independence and uncorrelatedness
B In general:
• ρXY = 0 =⇒
× X⊥
⊥Y
• ρXY.Z = 0 =⇒
× X⊥
⊥ Y |Z
B However, if the joint distribution F(XYZ) is normal:
• ρXY = 0 =⇒ X ⊥
⊥Y
• ρXY.Z = 0 =⇒ X ⊥
⊥ Y |Z
Causality in Econometrics
17/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Population and sample
B Notice also the difference between population parameters and
sample statistics:
ρXY =
σXY
σX σY
rYX =
σXY
σX2
ρ̂XY = q
r̂YX =
∑nk=1 (Xk − X̄)(Yk − Ȳ)
∑nk=1 (Xk − X̄)2 ∑nk=1 (Yk − Ȳ)2
∑nk=1 (Xk − X̄)(Yk − Ȳ)
∑nk=1 (Xk − X̄)2
β̂ OLS = (X0 X)−1 XY,
for vectors of data X ≡ (X1 , . . . , Xn )0 , Y ≡ (Y1 , . . . , Yn )0 and where X̄ = n−1 ΣXi .
Notice that when X̄ = 0 and Ȳ = 0, r̂YX = β̂ OLS .
Causality in Econometrics
18/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Other concepts related to independence
B If, given the r.v. X and Y, the moments E(Xk ) < ∞ and E(Ym ) < ∞, it turns out
that X ⊥
⊥ Y iff
E(Xk Ym ) = E(Xk )E(Ym ), for all k, m = 1, 2, . . .
B X and Y are (k, m)-order dependent iff
E(Xk Ym ) 6= E(Xk )E(Ym ), for any k, m = 1, 2, . . .
B (1-1)-order linear dependence:
E(XY) 6= E(X)E(Y)
B (1-1)-order independence:
E(XY) = E(X)E(Y) ⇔ E{[X − E(X)][Y − E(Y)]} = 0 ⇔ σXY = 0 ⇔ ρXY = 0
B Orthogonality
E(XY) = 0
B Note:
1 if X and Y are uncorrelated (ρXY = 0), this is equivalent to say that their
mean deviations are orthogonal (if X and Y are “centered”, subtracting
their mean, they become orthogonal).
2 if X and Y are orthogonal, ρXY = 0 only if E(X) = 0 or E(Y) = 0
Causality in Econometrics
19/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Other concepts related to independence
B r-th order independence
E(Yr |X = x) = 0 for all x ∈ RX
B In summary:
independence =⇒ 1st -order independence =⇒ non-correlation
⇐⇒ orthogonality mean-subtracted variables
non-correlation =⇒
× independence (there could be non-liner
dependencies!)
(cfr. Spanos 1999: 272-279)
Causality in Econometrics
20/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Statistical model
B Importance of defining a statistical model.
B Typical statistical model for continuous set of n random variables
X
• Probability model: defines a family of density functions f (x; θ )
defined over the range of values of X;
• Sampling model: X ((T × n) matrix of data) is a random sample.
(cfr. Spanos 1999: 33)
Causality in Econometrics
21/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
The Markov Condition
B The Markov condition permits the representation of probabilistic
dependence through a DAG. In particular, it imposes a
relationship between the Bayesian network (DAG in which
nodes are random variables) and the probabilistic structure.
• A directed acyclic graph G over V (set of vertices) and a probability
distribution P(V) satisfy the Markov condition iff for every W ∈ V,
W ⊥
⊥ V\(Descendants(W ) ∪ Parents(W )) given Parents(W ).
(Spirtes et al. 2000: 11)
• or, in other words:
Any vertex (node) is conditionally independent of its nondescendants (except
parents), given its parents.
Causality in Econometrics
22/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Markov Condition (example)
V1
6
- V
2
-V
4
-V
5
V3
• The DAG above and the probability distribution
P(V1 , V2 , V3 , V4 ) satisfy MC iff:
(1) V4 ⊥
⊥ {V1 , V3 }|V2
(2) V5 ⊥
⊥ {V1 , V2 , V3 }|V4
• Notice that many other c.i. relations follow from (1) and (2) by
applying symmetry, decomposition, and weak union (see Slide
For example
13
).
• {V1 , V3 } ⊥
⊥ V4 | V2 ; V1 ⊥
⊥ V4 |V2 ; V3 ⊥
⊥ V4 | V2 ;
V1 , ⊥
⊥ V4 |{V2 , V3 }; etc.
• { V1 , V2 , V3 } ⊥
⊥ V5 |V4 ; V5 ⊥
⊥ {V1 , V2 }|V4 ; etc.
Causality in Econometrics
23/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Markov condition (factorization)
B The M.C. permits the following factorization:
• discrete case: P(V1 , . . . , Vn ) = Πni=1 P(Vi |Parents(Vi )),where if
Parents(Vi ) = ∅, P(Vi |Parents(Vi )) = P(Vi )
• continuous case: f (V1 , . . . , Vn ) = Πni=1 f (Vi |Parents(Vi )), where if
Parents(Vi ) = ∅, f (Vi |Parents(Vi )) = f (Vi )
V1
6
- V
2
-V
4
-V
5
V3
• We have: P(V1 , V2 , V3 , V4 , V5 ) =
P ( V1 | V3 ) P ( V2 | V1 , V3 ) P ( V3 ) P ( V4 | V2 ) P ( V5 | V4 )
Recall chain rule: in general P(V1 , . . . , Vn ) = P(Vn |Vn−1 , . . . , V2 , V1 ), . . . , P(V2 |V1 )P(V1 )
Causality in Econometrics
24/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
The d-separation criterion
B d-separation: a graphical criterion which captures exactly all the
C.I. relationships that are implied by the M.C.∗
B Consider a graph G, with distinct nodes X, Y and a set of nodes W,
where neither X nor Y belongs to W. We say that X and Y are
d-separated given W in G iff there exists no undirected path U
between X and Y, such that:
1 every collider C (−→ C ←−) on U is in W or has a descendant in W,
and
2 no other vertex on U is in W.
• if there is such a path, then X and Y are d-connected.
(cfr. Spirtes et al. 2000: 14).
∗
Included those derived by the MC through symmetry, decomposition and weak union.
Causality in Econometrics
25/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
The d-separation criterion (Pearl’s definition)
B d-separation:
B Consider a graph G, with distinct nodes X, Y and a set of nodes W,
where neither X nor Y belongs to W. A path U is said to be
d-separated by a set of nodes W iff
1 U contains a chain (−→ C −→ or ←− C ←−) or a fork (←− C −→)
such that the middle node C ∈ W, or
2 U contains a collider C (−→ C ←−) s.t. C ∈
/ W and s.t. no descendant
of C is in W.
• A set W is said to d-separate X from Y iff W every path from X to Y
is d-separated by W.
• Otherwise X and Y d-connected by W.
(cfr. Pearl 2000: 16-17).
Causality in Econometrics
26/27
Graphical Causal Models References
Reading List
• Spirtes, Glymour, Scheines (2000), Causation, Prediction, and Search, MIT Press 2nd
edition:
• Chapter 1 and 2
• Pearl (2000), Causality: Models, Reasoning, and Inference, CUP:
• Section 1.1 and 1.2
• Spanos, A. (1999), Probability Theory and Statistical Inference. CUP:
• Section 2.2 and 6.4
Further reading:
• Cooper, G.F. (1999), An Overview of the Representation and Discovery of Causal
Relationships Using Bayesian Networks, in C. Glymour, G.F. Cooper,
Computation Causation, and Discovery, MIT Press.
• Scheines, R. (1997), An Introduction to causal inference. www
Causality in Econometrics
27/27