Abstract
In many empirical sciences, the causal mechanisms underlying various phenomena need to be studied. Structural equation modeling is a general framework used for multivariate analysis, and provides a powerful method for studying causal mechanisms. However, in many cases, classical structural equation modeling is not capable of estimating the causal directions of variables. This is because it explicitly or implicitly assumes Gaussianity of data and typically utilizes only the covariance structure of data. In many applications, however, non-Gaussian data are often obtained, which means that more information may be contained in the data distribution than the covariance matrix is capable of containing. Thus, many new methods have recently been proposed for utilizing the non-Gaussian structure of data and estimating the causal directions of variables. In this paper, we provide an overview of such recent developments in causal inference, and focus in particular on the non-Gaussian methods known as LiNGAM.
Similar content being viewed by others
References
Amari, S. (1998). Natural gradient learning works efficiently in learning. Neural Computation, 10:251–276.
Bach, F. R. and Jordan, M. I. (2002). Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48.
Bentler, P. M. (1983). Some contributions to efficient statistics in structural models: Specification and estimation of moment structures. Psychometrika, 48:493–517.
Bollen, K. (1989). Structural Equations with Latent Variables. John Wiley & Sons.
Bühlmann, P. (2013). Causal statistical inference in high dimensions. Mathematical Methods of Operations Research, 77(3):3–370.
Bühlmann, P., Peters, J., and Ernest, J. (2013). CAM: Causal additive models, high-dimensional order search and penalized regression. arXiv:1310.1533.
Cai, R., Zhang, Z., and Hao, Z. (2013). SADA: A general framework to support robust causation discovery. In Proc. 30th International Conference on Machine Learning (ICML2013), pages 208–216.
Chen, Z. and Chan, L. (2013). Causality in linear nonGaussian acyclic models in the presence of latent Gaussian confounders. Neural Computation, 25(6):6–1641.
Chickering, D. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554.
Comon, P. (1994). Independent component analysis, a new concept? Signal Processing, 36:62–83.
Darmois, G. (1953). Analyse g’en’erale des liaisons stochastiques. Review of the International Statistical Institute, 21:2–8.
Dodge, Y. and Rousson, V. (2001). On asymmetric properties of the correlation coefficient in the regression setting. The American Statistician, 55(1):1–54.
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall, New York.
Entner, D. and Hoyer, P. (2010). On causal discovery from time series data using FCI. In Proc. 5th European Workshop on Probabilistic Graphical Models (PGM2010).
Entner, D. and Hoyer, P. O. (2011). Discovering unconfounded causal relationships using linear non-Gaussian models. In New Frontiers in Artificial Intelligence, Lecture Notes in Computer Science, volume 6797, pages 181–195.
Entner, D. and Hoyer, P. O. (2012). Estimating a causal order among groups of variables in linear models. In Proc. 22nd International Conference on Artificial Neural Networks (ICANN2012), pages 83–90.
Eriksson, J. and Koivunen, V. (2004). Identifiability, separability, and uniqueness of linear ICA models. IEEE Signal Processing Letters, 11:601–604.
Ferkingsta, E., Lølanda, A., and Wilhelmsen, M. (2011). Causal modeling and inference for electricity markets. Energy Economics, 33(3):3–412.
Gao, W. and Yang, H. (2012). Identifying structural VAR model with latent variables using overcomplete ICA. Far East Journal of Theoretical Statistics, 40(1):1–44.
Glymour, C. (2010). What is right with ‘Bayes net methods’ and what is wrong with ‘hunting causes and using them’? The British Journal for the Philosophy of Science, 61(1):1–211.
Gretton, A., Bousquet, O., Smola, A. J., and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In Proc. 16th International Conference on Algorithmic Learning Theory (ALT2005), pages 63–77.
Henao, R. and Winther, O. (2011). Sparse linear identifiable multivariate modeling. Journal of Machine Learning Research, 12:863–905.
Himberg, J., Hyvärinen, A., and Esposito, F. (2004). Validating the independent components of neuroimaging time-series via clustering and visualization. NeuroImage, 22:1214–1222.
Hirayama, J. and Hyvärinen, A. (2011). Structural equations and divisive normalization for energy-dependent component analysis. In Advances in Neural Information Processing Systems 23, pages 1872–1880.
Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81:945–970.
Hoyer, P. O. and Hyttinen, A. (2009). Bayesian discovery of linear acyclic causal models. In Proc. 25th Conference on Uncertainty in Artificial Intelligence (UAI2009), pages 240–248.
Hoyer, P. O., Hyvärinen, A., Scheines, R., Spirtes, P., Ramsey, J., Lacerda, G., and Shimizu, S. (2008a). Causal discovery of linear acyclic models with arbitrary distributions. In Proc. 24th Conference on Uncertainty in Artificial Intelligence (UAI2008), pages 282–289.
Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., and Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 21, pages 689–696.
Hoyer, P. O., Shimizu, S., Kerminen, A., and Palviainen, M. (2008b). Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2):2–378.
Hurley, D., Araki, H., Tamada, Y., Dunmore, B., Sanders, D., Humphreys, S., Affara, M., Imoto, S., Yasuda, K., Tomiyasu, Y., et al. (2012). Gene network inference and visualization tools for biologists: Application to new human transcriptome datasets. Nucleic Acids Research, 40(6):6–2398.
Hyvärinen, A. (1998). New approximations of differential entropy for independent component analysis and projection pursuit. In Advances in Neural Information Processing Systems 10, pages 273–279.
Hyvärinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10:626–634.
Hyvärinen, A. (2013). Independent component analysis: Recent advances. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371:20110534.
Hyvärinen, A., Karhunen, J., and Oja, E. (2001). Independent component analysis. Wiley, New York.
Hyvärinen, A. and Smith, S. M. (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14:111–152.
Hyvärinen, A., Zhang, K., Shimizu, S., and Hoyer, P. O. (2010). Estimation of a structural vector autoregressive model using non-Gaussianity. Journal of Machine Learning Research, 11:1709–1731.
Imoto, S., Kim, S., Goto, T., Aburatani, S., Tashiro, K., Kuhara, S., and Miyano, S. (2002). Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. In Proc. 1st IEEE Computer Society Bioinformatics Conference, pages 219–227.
Jutten, C. and H’erault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24(1):1–10.
Kadowaki, K., Shimizu, S., and Washio, T. (2013). Estimation of causal structures in longitudinal data using non-Gaussianity. In Proc. 23rd IEEE International Workshop on Machine Learning for Signal Processing (MLSP2013). In press.
Kawahara, Y., Bollen, K., Shimizu, S., and Washio, T. (2010). GroupLiNGAM: Linear non-Gaussian acyclic models for sets of variables. arXiv:1006.5041.
Kawahara, Y., Shimizu, S., and Washio, T. (2011). Analyzing relationships among ARMA processes based on non-Gaussianity of external influences. Neurocomputing, 4(12-13):2212–2221.
Komatsu, Y., Shimizu, S., and Shimodaira, H. (2010). Assessing statistical reliability of LiNGAM via multiscale bootstrap. In Proc. 20th International Conference on Artificial Neural Networks (ICANN2010), pages 309–314.
Kraskov, A., Stögbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6):066138.
Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83–97.
Lacerda, G., Spirtes, P., Ramsey, J., and Hoyer, P. O. (2008). Discovering cyclic causal models by independent components analysis. In Proc. 24th Conference on Uncertainty in Artificial Intelligence (UAI2008), pages 366–374.
Lewicki, M. and Sejnowski, T. J. (2000). Learning overcomplete representations. Neural Computation, 12(2):2–365.
Maathuis, M., Colombo, D., Kalisch, M., and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4):4–248.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1):1–166.
Moneta, A., Entner, D., Hoyer, P., and Coad, A. (2013). Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75:705–730.
Mooij, J., Janzing, D., Heskes, T., and Schölkopf, B. (2011). Causal discovery with cyclic additive noise models. In Advances in Neural Information Processing Systems 24, pages 639–647.
Mooij, J., Janzing, D., Peters, J., and Schölkopf, B. (2009). Regression by dependence minimization and its application to causal inference in additive noise models. In Proc. 26th International Conference on Machine Learning (ICML2009), pages 745–752. Omnipress.
Neyman, J. (1923). Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle.
Niyogi, D., Kishtawal, C., Tripathi, S., and Govindaraju, R. S. (2010). Observational evidence that agricultural intensification and land use change may be reducing the Indian summer monsoon rainfall. Water Resources Research, 46:W03533.
Ozaki, K. and Ando, J. (2009). Direction of causation between shared and non-shared environmental factors. Behavior Genetics, 39(3):3–336.
Ozaki, K., Toyoda, H., Iwama, N., Kubo, S., and Ando, J. (2011). Using non-normal SEM to resolve the ACDE model in the classical twin design. Behavior Genetics, 41(2):2–339.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4):4–688.
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. (2nd ed. 2009).
Pearl, J. and Verma, T. (1991). A theory of inferred causation. In Allen, J., Fikes, R., and Sandewall., E., editors, Proc. 2nd International Conference on Principles of Knowledge Representation and Reasoning, pages 441–452. Morgan Kaufmann, San Mateo, CA.
Pe’er, D. and Hacohen, N. (2011). Principles and strategies for developing network models in cancer. Cell, 144:864–873.
Peters, J., Janzing, D., and Schölkopf, B. (2011a). Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12):12–2450.
Peters, J., Janzing, D., and Schölkopf, B. (2013). Causal inference on time series using restricted structural equation models. In Advances in Neural Information Processing Systems 26.
Peters, J., Mooij, J., Janzing, D., and Schölkopf, B. (2011b). Identifiability of causal graphs using functional models. Proc. 27th Conference on Uncertainty in Artificial Intelligence (UAI2011), pages 589–598.
Ramsey, J., Hanson, S., and Glymour, C. (2011). Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage, 58(3):3–848.
Richardson, T. (1996). A polynomial-time algorithm for deciding Markov equivalence of directed cyclic graphical models. In Proc. 12th Conference on Uncertainty in Artificial Intelligence (UAI1996), pages 462–469.
Rosenström, T., Jokela, M., Puttonen, S., Hintsanen, M., Pulkki-Råback, L., Viikari, J. S., Raitakari, O. T., and Keltikangas-Järvinen, L. (2012). Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PloS ONE, 7(11):e50841.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66:688–701.
Schaechtle, U., Stathis, K., Holloway, R., and Bromuri, S. (2013). Multi-dimensional causal discovery. In Proc. 23rd International Joint Conference on Artificial Intelligence (IJCAI2013), pages 1649–1655.
Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. (2012). On causal and anticausal learning. In Proc. 29th International Conference on Machine learning (ICML2012), pages 1255–1262.
Shimizu, S. (2012). Joint estimation of linear non-Gaussian acyclic models. Neurocomputing, 81:104–107.
Shimizu, S. and Bollen, K. (2013). Bayesian estimation of possible causal direction in the presence of latent confounders using a linear non-Gaussian acyclic structural equation model with individual-specific effects. arXiv:1310.6778.
Shimizu, S., Hoyer, P. O., and Hyvärinen, A. (2009). Estimation of linear non-Gaussian acyclic models for latent factors. Neurocomputing, 72:2024–2027.
Shimizu, S., Hoyer, P. O., Hyvärinen, A., and Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030.
Shimizu, S. and Hyvarinen, A. (2008). Discovery of linear non-Gaussian acyclic models in the presence of latent classes. In Proc. 14th International Conference on Neural Information Processing (ICONIP2007), pages 752–761.
Shimizu, S., Inazumi, T., Sogawa, Y., Hyvarinen, A., Kawahara, Y., Washio, T., Hoyer, P. O., and Bollen, K. (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12:1225–1248.
Shimizu, S. and Kano, Y. (2008). Use of non-normality in structural equation modeling: Application to direction of causation. Journal of Statistical Planning and Inference, 138:3483–3491.
Shpitser, I. and Pearl, J. (2006). Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proc. 22nd Conference on Uncertainty in Artificial Intelligence (UAI2006), pages 437–444.
Shpitser, I. and Pearl, J. (2008). Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9:1941–1979.
Skitovitch, W. P. (1953). On a property of the normal distribution. Doklady Akademii Nauk SSSR, 89:217–219.
Smith, S. (2012). The future of FMRI connectivity. NeuroImage, 62(2):2–1266.
Smith, S., Miller, K., Salimi-Khorshidi, G., Webster, M., Beckmann, C., Nichols, T., Ramsey, J., and Woolrich, M. (2011). Network modelling methods for FMRI. NeuroImage, 54(2):2–891.
Sogawa, Y., Shimizu, S., Shimamura, T., Hyvärinen, A., Washio, T., and Imoto, S. (2011). Estimating exogenous variables in data with more variables than observations. Neural Networks, 24(8):8–880.
Spirtes, P. and Glymour, C. (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9:67–72.
Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, Prediction, and Search. Springer Verlag. (2nd ed. MIT Press, 2000).
Spirtes, P., Meek, C., and Richardson, T. (1995). Causal inference in the presence of latent variables and selection bias. In Proc. 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI1995), pages 491–506.
Statnikov, A., Henaff, M., Lytkin, N. I., and Aliferis, C. F. (2012). New methods for separating causes from effects in genomics data. BMC Genomics, 13(Suppl 8):S22.
Swanson, N. and Granger, C. (1997). Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions. Journal of the American Statistical Association, pages 357–367.
Takahashi, Y., Ozaki, K., Roberts, B., and Ando, J. (2012). Can low behavioral activation system predict depressive mood?: An application of non-normal structural equation modeling. Japanese Psychological Research, 54(2):2–181.
Tashiro, T., Shimizu, S., Hyvärinen, A., and Washio, T. (2014). ParceLiNGAM: A causal ordering method robust against latent confounders. Neural Computation.
Thamvitayakul, K., Shimizu, S., Ueno, T., Washio, T., and Tashiro, T. (2012). Bootstrap confidence intervals in DirectLiNGAM. In Proc. 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW2012), pages 659–668. IEEE.
Tillman, R. E., Gretton, A., and Spirtes, P. (2010). Nonlinear directed acyclic structure learning with weakly additive noise models. In Advances in Neural Information Processing Systems 22, pages 1847–1855.
Tillman, R. E. and Spirtes, P. (2011). When causality matters for prediction: Investigating the practical tradeoffs. In JMLR Workshop and Conference Proceedings, Causality: Objectives and Assessment (Proc. NIPS2008 Workshop on Causality), volume 6, pages 373–382.
Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20:557–585.
Zhang, K. and Chan, L.-W. (2006). ICA with sparse connections. In Proc. 7th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2006), pages 530–537.
Zhang, K. and Hyvärinen, A. (2009a). Causality discovery with additive disturbances: An information-theoretical perspective. In Proc. European Conference on Machine Learning (ECML2009), pages 570–585.
Zhang, K. and Hyvärinen, A. (2009b). On the identifiability of the post-nonlinear causal model. In Proc. 25th Conference in Uncertainty in Artificial Intelligence (UAI2009), pages 647–655.
Zhang, K., Schölkopf, B., and Janzing, D. (2010). Invariant Gaussian process latent variable models and application in causal discovery. In Proc. 26nd Conference on Uncertainty in Artificial Intelligence (UAI2010), pages 717–724.
Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101:1418–1429.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Shimizu, S. Lingam: Non-Gaussian Methods for Estimating Causal Structures. Behaviormetrika 41, 65–98 (2014). https://doi.org/10.2333/bhmk.41.65
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.2333/bhmk.41.65