[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3692070.3694143guideproceedingsArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Optimal kernel choice for score function-based causal discovery

Published: 21 July 2024 Publication History

Abstract

Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. (2018) proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods.

References

[1]
Adriaans, P. and van Benthem, J. Handbook of Philosophy of Information. Elsevier, 2008.
[2]
Antonakis, J. and Lalive, R. Counterfactuals and causal inference: Methods and principles for social research. Structural Equation Modeling, 18(1):152-159, 2011.
[3]
Baker, C. R. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186: 273-289, 1973.
[4]
Ben-Israel, A. The change-of-variables formula using matrix volume. SIAM Journal on Matrix Analysis and Applications, 21(1):300-312, 1999.
[5]
Bühlmann, P., Peters, J., and Ernest, J. Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42(6):2526-2556, 2014.
[6]
Chapelle, O. and Vapnik, V. Model selection for support vector machines. Advances in neural information processing systems, 12, 1999.
[7]
Chickering, D. M. Optimal structure identification with greedy search. Journal of machine learning research, 3 (Nov):507-554, 2002.
[8]
Chu, C.-K. and Marron, J. S. Choosing a kernel regression estimator. Statistical science, pp. 404-419, 1991.
[9]
Friedman, N. and Nachman, I. Gaussian process networks. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 211-219, 2000.
[10]
Fukumizu, K., Bach, F. R., and Jordan, M. I. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 5 (Jan):73-99, 2004.
[11]
Fukumizu, K., Gretton, A., Sun, X., and Schölkopf, B. Kernel measures of conditional dependence. Advances in neural information processing systems, 20, 2007.
[12]
Fukumizu, K., Bach, F. R., and Jordan, M. I. Kernel dimension reduction in regression. The Annals of Statistics, 37 (4):1871-1905, 2009.
[13]
Garreau, D., Jitkrittum, W., and Kanagawa, M. Large sample analysis of the median heuristic. arXiv preprint arXiv:1707.07269, 2017.
[14]
Geiger, D. and Heckerman, D. Learning gaussian networks. In Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 235-243, 1994.
[15]
Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. Measuring statistical dependence with hilbert-schmidt norms. In International conference on algorithmic learning theory, pp. 63-77. Springer, 2005.
[16]
Gretton, A., Fukumizu, K., Teo, C., Song, L., Schölkopf, B., and Smola, A. A kernel statistical test of independence. Advances in neural information processing systems, 20, 2007.
[17]
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schoelkopf, B., and Smola, A. A kernel two-sample test. Journal of Machine Learning Research, 13, 2012a.
[18]
Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., and Fukumizu, K. Optimal kernel choice for large-scale two-sample tests. In Proceedings of Conference on Advances in Neural Information Processing Systems, pp. 1205-1213, 2012b.
[19]
Haughton, D. M. On the choice of a model to fit data from an exponential family. The annals of statistics, 16:342-355, 1988.
[20]
Herrmann, E., Gasser, T., and Kneip, A. Choice of bandwidth for kernel regression when residuals are correlated. Biometrika, 79(4):783-795, 1992.
[21]
Huang, B., Zhang, K., Lin, Y., Schölkopf, B., and Glymour, C. Generalized score functions for causal discovery. In Proceedings of ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1551-1560, 2018.
[22]
Hyvärinen, A. and Smith, S. M. Pairwise likelihood ratios for estimation of non-gaussian structural equation models. Journal of Machine Learning Research, 14:111-152, 2013.
[23]
Keropyan, G., Strieder, D., and Drton, M. Rank-based causal discovery for post-nonlinear models. In International Conference on Artificial Intelligence and Statistics, pp. 7849-7870. PMLR, 2023.
[24]
Kim, S.-J., Magnani, A., and Boyd, S. Optimal kernel selection in kernel fisher discriminant analysis. In Proceedings of the 23rd international conference on Machine learning, pp. 465-472, 2006.
[25]
Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[26]
Liu, D. C. and Nocedal, J. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1-3):503-528, 1989.
[27]
Liu, F., Xu, W., Lu, J., Zhang, G., Gretton, A., and Sutherland, D. J. Learning deep kernels for non-parametric two-sample tests. In International conference on machine learning, pp. 6316-6326. PMLR, 2020.
[28]
Liu, Q., Lee, J., and Jordan, M. A kernelized stein discrepancy for goodness-of-fit tests. In International conference on machine learning, pp. 276-284. PMLR, 2016.
[29]
Londei, A., D 'Ausilio, A., Basso, D., and Belardinelli, M. O. A new method for detecting causality in fmri data of cognitive processing. Cognitive processing, 7:42-52, 2006.
[30]
Lorch, L., Rothfuss, J., Schölkopf, B., and Krause, A. Dibs: Differentiable bayesian structure learning. Advances in Neural Information Processing Systems, 34:24111-24123, 2021.
[31]
Maxwell Chickering, D. and Heckerman, D. Efficient approximations for the marginal likelihood of bayesian networks with hidden variables. Machine learning, 29:181-212, 1997.
[32]
Moneta, A., Entner, D., Hoyer, P. O., and Coad, A. Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5):705-730, 2013.
[33]
Ramdas, A., Reddi, S. J., Póczos, B., Singh, A., and Wasserman, L. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
[34]
Schölkopf, B., Smola, A. J., Bach, F., et al. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, 2002.
[35]
Schwarz, G. Estimating the dimension of a model. Annals of Statistics, 6(2):461-464, 1978.
[36]
Silander, T. and Myllymäki, P. A simple approach for finding the globally optimal bayesian network structure. In Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 445-452, 2006.
[37]
Sokolova, E., Groot, P., Claassen, T., and Heskes, T. Causal discovery from databases with discrete and continuous variables. Lecture Notes in Computer Science; 8754, pp. 442-457, 2014.
[38]
Spirtes, P. and Zhang, K. Causal discovery and inference: concepts and recent methodological advances. In Applied informatics, volume 3, pp. 1-28, 2016.
[39]
Spirtes, P., Glymour, C. N., Scheines, R., and Heckerman, D. Causation, prediction, and search. MIT press, 2000.
[40]
Steinwart, I. and Christmann, A. Support vector machines. Springer Science & Business Media, 2008.
[41]
Sutherland, D. J., Tung, H.-Y., Strathmann, H., De, S., Ramdas, A., Smola, A., and Gretton, A. Generative models and model criticism via optimized maximum mean discrepancy. In International Conference on Learning Representations, 2021.
[42]
Tsamardinos, I., Brown, L. E., and Aliferis, C. F. The maxmin hill-climbing bayesian network structure learning algorithm. Machine learning, 65:31-78, 2006.
[43]
Uemura, K., Takagi, T., Takayuki, K., Yoshida, H., and Shimizu, S. A multivariate causal discovery based on post-nonlinear model. In Proceedings of Conference on Causal Learning and Reasoning, pp. 826-839, 2022.
[44]
Vowels, M. J., Camgoz, N. C., and Bowden, R. D'ya like dags? a survey on structure learning and causal discovery. ACM Computing Surveys, 55(4):1-36, 2022.
[45]
Williams, C. and Rasmussen, C. Gaussian processes for regression. Advances in neural information processing systems, 8, 1995.
[46]
Yu, Y., Chen, J., Gao, T., and Yu, M. Dag-gnn: Dag structure learning with graph neural networks. In International Conference on Machine Learning, pp. 7154-7163. PMLR, 2019.
[47]
Yuan, C. and Malone, B. Learning optimal bayesian networks: A shortest path perspective. Journal of Artificial Intelligence Research, 48:23-65, 2013.
[48]
Zhang, K., Peters, J., Janzing, D., and Schölkopf, B. Kernel-based conditional independence test and application in causal discovery. In Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 804-813, 2011.
[49]
Zhang, K., Schölkopf, B., Spirtes, P., and Glymour, C. Learning causality and causality-related learning: some recent progress. National science review, 5(1):26-29, 2018.
[50]
Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.
[51]
Zheng, X., Dan, C., Aragam, B., Ravikumar, P., and Xing, E. Learning sparse nonparametric dags. In International Conference on Artificial Intelligence and Statistics, pp. 3414-3425. PMLR, 2020.
[52]
Zhu, S., Ng, I., and Chen, Z. Causal discovery with reinforcement learning. In International Conference on Learning Representations, 2019.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'24: Proceedings of the 41st International Conference on Machine Learning
July 2024
63010 pages

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media