[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
rapid-communication

Sample selection bias in evaluation of prediction performance of causal models

Published: 10 January 2022 Publication History

Abstract

Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However, prediction performance does depend on the selection of training and test sets. Biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren. We find that sample selection bias is likely a key driver of model performance. We propose using a less‐biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance compared to standard association‐based estimators such as Lasso. Finally, we compare the performance of causal estimators in simulation studies that reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.

References

[1]
L. Breiman, Statistical modeling: The two cultures (with comments and a rejoinder by the author), Stat. Sci. 16 (2001), no. 3, 199–231.
[2]
J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw. 33 (2010), no. 1, 1–22.
[3]
A. Globerson and S. Roweis, Nightmare at test time: Robust learning by feature deletion. Proceedings of the 23rd international conference on Machine learning, 353–360, 2006.
[4]
J. Huang, A. Gretton, K. Borgwardt, B. Schölkopf, and A. Smola, Correcting sample selection bias by unlabeled data, Adv. Neural Inf. Process. Syst. 19 (2006), 601–608.
[5]
P. Kemmeren, K. Sameith, L. A. Van De Pasch, J. J. Benschop, T. L. Lenstra, T. Margaritis, E. O'Duibhir, E. Apweiler, S. van Wageningen, C. W. Ko, S. van Heesch, M. M. Kashani, G. Ampatziadis‐Michailidis, M. O. Brok, N. A. C. H. Brabers, A. J. Miles, D. Bouwmeester, S. R. van Hooff, H. van Bakel, E. Sluiters, L. V. Bakker, B. Snel, P. Lijnzaad, D. van Leenen, M. J. A. Groot Koerkamp, and F. C. P. Holstege, Large‐scale genetic perturbations reveal regulatory networks and an abundance of gene‐specific repressors, Cell 157 (2014), no. 3, 740–752.
[6]
N. Meinshausen, A. Hauser, J. M. Mooij, J. Peters, P. Versteeg, and P. Bühlmann, Methods for causal inference from gene perturbation experiments and validation, Proc. Natl. Acad. Sci. 113 (2016), no. 27, 7361–7368.
[7]
J. Pearl, Causal inference in statistics: An overview, Stat. Surv. 3 (2009), 96–146.
[8]
J. Peters, P. Bühlmann, and N. Meinshausen, Causal inference by using invariant prediction: Identification and confidence intervals, J. R. Stat. Soc. Series B Stat. Methodol. 78 (2016), 947–1012.
[9]
D. Rothenhäusler, P. Bühlmann, and N. Meinshausen, Causal dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions, Ann. Stat. 47 (2019), no. 3, 1688–1722.
[10]
D. B. Rubin, Causal inference using potential outcomes: Design, modeling, decisions, J. Am. Stat. Assoc. 100 (2005), no. 469, 322–331.
[11]
P. Versteeg and J. M. Mooij. Boosting local causal discovery in high‐dimensional expression data. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, 2599–2604, 2019.

Index Terms

  1. Sample selection bias in evaluation of prediction performance of causal models
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Statistical Analysis and Data Mining
          Statistical Analysis and Data Mining  Volume 15, Issue 1
          February 2022
          138 pages
          ISSN:1932-1864
          EISSN:1932-1872
          DOI:10.1002/sam.v15.1
          Issue’s Table of Contents

          Publisher

          John Wiley & Sons, Inc.

          United States

          Publication History

          Published: 10 January 2022

          Author Tags

          1. causal inference
          2. genetic perturbation experiments
          3. prediction
          4. sample selection bias

          Qualifiers

          • Rapid-communication

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 14 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media