More Web Proxy on the site http://driver.im/

research-article

Optimal kernel choice for score function-based causal discovery

AUTHORs:

Mingming GongAuthors Info & Claims

ICML'24: Proceedings of the 41st International Conference on Machine Learning

Article No.: 2073, Pages 50650 - 50668

Published: 21 July 2024 Publication History

Abstract

Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. (2018) proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods.

References

[1]

Adriaans, P. and van Benthem, J. Handbook of Philosophy of Information. Elsevier, 2008.

[2]

Antonakis, J. and Lalive, R. Counterfactuals and causal inference: Methods and principles for social research. Structural Equation Modeling, 18(1):152-159, 2011.

[3]

Baker, C. R. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186: 273-289, 1973.

[4]

Ben-Israel, A. The change-of-variables formula using matrix volume. SIAM Journal on Matrix Analysis and Applications, 21(1):300-312, 1999.

Digital Library

[5]

Bühlmann, P., Peters, J., and Ernest, J. Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42(6):2526-2556, 2014.

[6]

Chapelle, O. and Vapnik, V. Model selection for support vector machines. Advances in neural information processing systems, 12, 1999.

[7]

Chickering, D. M. Optimal structure identification with greedy search. Journal of machine learning research, 3 (Nov):507-554, 2002.

[8]

Chu, C.-K. and Marron, J. S. Choosing a kernel regression estimator. Statistical science, pp. 404-419, 1991.

[9]

Friedman, N. and Nachman, I. Gaussian process networks. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 211-219, 2000.

Digital Library

[10]

Fukumizu, K., Bach, F. R., and Jordan, M. I. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 5 (Jan):73-99, 2004.

[11]

Fukumizu, K., Gretton, A., Sun, X., and Schölkopf, B. Kernel measures of conditional dependence. Advances in neural information processing systems, 20, 2007.

[12]

Fukumizu, K., Bach, F. R., and Jordan, M. I. Kernel dimension reduction in regression. The Annals of Statistics, 37 (4):1871-1905, 2009.

[13]

Garreau, D., Jitkrittum, W., and Kanagawa, M. Large sample analysis of the median heuristic. arXiv preprint arXiv:1707.07269, 2017.

[14]

Geiger, D. and Heckerman, D. Learning gaussian networks. In Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 235-243, 1994.

[15]

Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. Measuring statistical dependence with hilbert-schmidt norms. In International conference on algorithmic learning theory, pp. 63-77. Springer, 2005.

Digital Library

[16]

Gretton, A., Fukumizu, K., Teo, C., Song, L., Schölkopf, B., and Smola, A. A kernel statistical test of independence. Advances in neural information processing systems, 20, 2007.

[17]

Gretton, A., Borgwardt, K. M., Rasch, M. J., Schoelkopf, B., and Smola, A. A kernel two-sample test. Journal of Machine Learning Research, 13, 2012a.

[18]

Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., and Fukumizu, K. Optimal kernel choice for large-scale two-sample tests. In Proceedings of Conference on Advances in Neural Information Processing Systems, pp. 1205-1213, 2012b.

[19]

Haughton, D. M. On the choice of a model to fit data from an exponential family. The annals of statistics, 16:342-355, 1988.

[20]

Herrmann, E., Gasser, T., and Kneip, A. Choice of bandwidth for kernel regression when residuals are correlated. Biometrika, 79(4):783-795, 1992.

[21]

Huang, B., Zhang, K., Lin, Y., Schölkopf, B., and Glymour, C. Generalized score functions for causal discovery. In Proceedings of ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1551-1560, 2018.

Digital Library

[22]

Hyvärinen, A. and Smith, S. M. Pairwise likelihood ratios for estimation of non-gaussian structural equation models. Journal of Machine Learning Research, 14:111-152, 2013.

Digital Library

[23]

Keropyan, G., Strieder, D., and Drton, M. Rank-based causal discovery for post-nonlinear models. In International Conference on Artificial Intelligence and Statistics, pp. 7849-7870. PMLR, 2023.

[24]

Kim, S.-J., Magnani, A., and Boyd, S. Optimal kernel selection in kernel fisher discriminant analysis. In Proceedings of the 23rd international conference on Machine learning, pp. 465-472, 2006.

Digital Library

[25]

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[26]

Liu, D. C. and Nocedal, J. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1-3):503-528, 1989.

[27]

Liu, F., Xu, W., Lu, J., Zhang, G., Gretton, A., and Sutherland, D. J. Learning deep kernels for non-parametric two-sample tests. In International conference on machine learning, pp. 6316-6326. PMLR, 2020.

[28]

Liu, Q., Lee, J., and Jordan, M. A kernelized stein discrepancy for goodness-of-fit tests. In International conference on machine learning, pp. 276-284. PMLR, 2016.

Digital Library

[29]

Londei, A., D 'Ausilio, A., Basso, D., and Belardinelli, M. O. A new method for detecting causality in fmri data of cognitive processing. Cognitive processing, 7:42-52, 2006.

[30]

Lorch, L., Rothfuss, J., Schölkopf, B., and Krause, A. Dibs: Differentiable bayesian structure learning. Advances in Neural Information Processing Systems, 34:24111-24123, 2021.

[31]

Maxwell Chickering, D. and Heckerman, D. Efficient approximations for the marginal likelihood of bayesian networks with hidden variables. Machine learning, 29:181-212, 1997.

Digital Library

[32]

Moneta, A., Entner, D., Hoyer, P. O., and Coad, A. Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5):705-730, 2013.

[33]

Ramdas, A., Reddi, S. J., Póczos, B., Singh, A., and Wasserman, L. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.

[34]

Schölkopf, B., Smola, A. J., Bach, F., et al. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, 2002.

Digital Library

[35]

Schwarz, G. Estimating the dimension of a model. Annals of Statistics, 6(2):461-464, 1978.

[36]

Silander, T. and Myllymäki, P. A simple approach for finding the globally optimal bayesian network structure. In Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 445-452, 2006.

[37]

Sokolova, E., Groot, P., Claassen, T., and Heskes, T. Causal discovery from databases with discrete and continuous variables. Lecture Notes in Computer Science; 8754, pp. 442-457, 2014.

[38]

Spirtes, P. and Zhang, K. Causal discovery and inference: concepts and recent methodological advances. In Applied informatics, volume 3, pp. 1-28, 2016.

[39]

Spirtes, P., Glymour, C. N., Scheines, R., and Heckerman, D. Causation, prediction, and search. MIT press, 2000.

[40]

Steinwart, I. and Christmann, A. Support vector machines. Springer Science & Business Media, 2008.

[41]

Sutherland, D. J., Tung, H.-Y., Strathmann, H., De, S., Ramdas, A., Smola, A., and Gretton, A. Generative models and model criticism via optimized maximum mean discrepancy. In International Conference on Learning Representations, 2021.

[42]

Tsamardinos, I., Brown, L. E., and Aliferis, C. F. The maxmin hill-climbing bayesian network structure learning algorithm. Machine learning, 65:31-78, 2006.

Digital Library

[43]

Uemura, K., Takagi, T., Takayuki, K., Yoshida, H., and Shimizu, S. A multivariate causal discovery based on post-nonlinear model. In Proceedings of Conference on Causal Learning and Reasoning, pp. 826-839, 2022.

[44]

Vowels, M. J., Camgoz, N. C., and Bowden, R. D'ya like dags? a survey on structure learning and causal discovery. ACM Computing Surveys, 55(4):1-36, 2022.

Digital Library

[45]

Williams, C. and Rasmussen, C. Gaussian processes for regression. Advances in neural information processing systems, 8, 1995.

[46]

Yu, Y., Chen, J., Gao, T., and Yu, M. Dag-gnn: Dag structure learning with graph neural networks. In International Conference on Machine Learning, pp. 7154-7163. PMLR, 2019.

[47]

Yuan, C. and Malone, B. Learning optimal bayesian networks: A shortest path perspective. Journal of Artificial Intelligence Research, 48:23-65, 2013.

Digital Library

[48]

Zhang, K., Peters, J., Janzing, D., and Schölkopf, B. Kernel-based conditional independence test and application in causal discovery. In Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 804-813, 2011.

[49]

Zhang, K., Schölkopf, B., Spirtes, P., and Glymour, C. Learning causality and causality-related learning: some recent progress. National science review, 5(1):26-29, 2018.

[50]

Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.

[51]

Zheng, X., Dan, C., Aragam, B., Ravikumar, P., and Xing, E. Learning sparse nonparametric dags. In International Conference on Artificial Intelligence and Statistics, pp. 3414-3425. PMLR, 2020.

[52]

Zhu, S., Ng, I., and Chen, Z. Causal discovery with reinforcement learning. In International Conference on Learning Representations, 2019.

Index Terms

Optimal kernel choice for score function-based causal discovery

Index terms have been assigned to the content through auto-classification.

Recommendations

Optimal kernel selection in Kernel Fisher discriminant analysis
ICML '06: Proceedings of the 23rd international conference on Machine learning

In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider ...
A pre-selecting base kernel method in multiple kernel learning

The pre-defined base kernel greatly affects the performance of multiple kernel learning (MKL), but selecting the pre-defined base kernel still has no theoretical guidance. In practice, it is very difficult to select a set of appropriate base kernels ...
Causal Discovery via Causal Star Graphs
Discovering causal relationships among observed variables is an important research focus in data mining. Existing causal discovery approaches are mainly based on constraint-based methods and functional causal models (FCMs). However, the constraint-based ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'24: Proceedings of the 41st International Conference on Machine Learning

July 2024

63010 pages

Copyright © 2024.

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

Research-article
Research
Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten