Abstract
We address the problem of causal discovery in the two-variable case, given a sample from their joint distribution. Since X → Y and Y → X are Markov equivalent, conditional-independence-based methods [Spirtes et al., 2000, Pearl, 2009] can not recover the causal graph. Alternative methods, introduce asymmetries between cause and effect by restricting the function class (e.g., [Hoyer et al., 2009]).
The proposed causal discovery method, CURE (Causal discovery with Unsupervised inverse REgression), is based on the principle of independence of causal mechanisms [Janzing and Schölkopf, 2010]. For the case of only two variables, it states that the marginal distribution of the cause, say P(X), and the conditional of the effect given the cause P(Y|X) are "independent", in the sense that they do not contain information about each other. This independence can be violated in the backward direction: the distribution of the effect P(Y) and the conditional P(X|Y) may contain information about each other because each of them inherits properties from both P(X) and P(Y|X), hence introducing an asymmetry between cause and effect. For deterministic causal relations (Y = f(X)), all the information about the conditional P(Y|X) is contained in the function f. In this case, previous work [Janzing et al., 2012] formalizes "independence" as uncorrelatedness between log f' and the density of P(X), both viewed as random variables. For non-deterministic relations, we propose an implicit notion of independence, namely that pY|X cannot be estimated based on pX (lower case denotes density). However, it may be possible to estimate pY|X based on the density of the effect, pY.
In practice, we are given empirical data x ∈ ℝN, y ∈ ℝN from P(X,Y) and estimate pX|Y based on y (intentionally hiding x). The relationship between the observed y and the latent x is modeled by a Gaussian Process (GP). Then, the required conditional pX|Y is estimated as pyX|Y : (x, y) → p(x|y, y), with p(x|y, y) estimated by marginalizing out the latent x and the GP hyperparameters.
CURE infers the causal direction using the procedure above two times: one to estimate pX|Y based only on y and another to estimate pY|X based only on x. If the first estimation is better, X → Y is inferred. Otherwise, Y → X. To evaluate the conditional's estimation, we compare it to the one using both x and y. CURE was evaluated on synthetic and real data and often outperformed existing methods. On the downside, its computational cost is comparably high. This work was recently published at AISTATS 2015 [Sgouritsa et al., 2015].