More Web Proxy on the site http://driver.im/

research-article

A Parallel Evolutionary Algorithm for Subset Selection in Causal Inference Models

Authors:

Wendy K. Tam Cho,

Yan Y. LiuAuthors Info & Claims

XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

Article No.: 7, Pages 1 - 8

https://doi.org/10.1145/2949550.2949568

Published: 17 July 2016 Publication History

Abstract

Science is concerned with identifying causal inferences. To move beyond simple observed relationships and associational inferences, researchers may employ randomized experimental designs to isolate a treatment effect, which then permits causal inferences. When experiments are not practical, a researcher is relegated to analyzing observational data. To make causal inferences from observational data, one must adjust the data so that they resemble data that might have emerged from an experiment. Traditionally, this has occurred through statistical models identified as matching methods. We claim that matching methods are unnecessarily constraining and propose, instead, that the goal is better achieved via a subset selection procedure that is able to identify statistically indistinguishable treatment and control groups. This reformulation to identifying optimal subsets leads to a model that is computationally complex. We develop an evolutionary algorithm that is more efficient and identifies empirically more optimal solutions than any other causal inference method. To gain greater efficiency, we also develop a scalable algorithm for a parallel computing environment by enlisting additional processors to search a greater range of the solution space and to aid other processors at particularly difficult peaks.

References

[1]

W. K. T. Cho, J. J. Sauppe, A. G. Nikolaev, S. H. Jacobson, and E. C. Sewell. An optimization approach for making causal inferences. Statistica Neerlandica, 67(2):211--226, May 2013.

[2]

W. G. Cochran and G. Cox. Experimental Designs. Chapman & Hall, London, 1957.

[3]

T. D. Cook and D. T. Campbell. Quasi-Experimentation: Design & Analysis Issues for Field Settings. Houghton Mifflin, 1979.

[4]

T. D. Cook and M. R. Payne. Objecting to the objections to using random assignment in educational research. In Evidence Matters: Randomized Trials in Education Research, pages 150--178. Brookings Institution Press, 2002.

[5]

P. V. da Veiga and R. P. Wilder. Maternal smoking during pregnancy and birthweight: A propensity score matching approach. Maternal and Child Health Journal, 12(2):194--203, March 2008.

[6]

R. Dehejia and S. Wahba. Causal effects in non-experimental studies: Re-evaluating the evaluation of training programs. Journal of the American Statistical Association, 94(448):1053--1062, 1999.

[7]

A. Diamond and J. S. Sekhon. Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3):932--945, 2013.

[8]

E. Erez. Randomized experiments in correctional context: Legal, ethnical, and practical concerns. Journal of Criminal Justice, 14(5):389--400, 1986.

[9]

R. A. Fisher. Design of Experiments. Hafner, New York, 1935.

[10]

B. B. Hansen and J. Bowers. Covariate balance in simple, stratified and clustered comparative studies. Statistical Science, 23(2):219--236, 2008.

[11]

P. W. Holland. Statistics and causal inference. Journal of the American Statistical Association, 81(396):945--960, 1986.

[12]

K. Imai. Do get-out-the-vote calls reduce turnout? the importance of statistical methods for field experiments. American Political Science Review, 99(2):283--300, 2005.

[13]

S. Johnston, J. Rootenberg, S. Katrak, W. Smith, and J. Elkins. Effect of a u.s. national institutes of health programme of clinical trials on public health and costs. Lancet, 367(9519):1319--1327, April 2006.

[14]

R. LaLonde. Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76:604--20, September 1986.

[15]

Y. Y. Liu and S. Wang. A scalable parallel genetic algorithm for the generalized assignment problem. Parallel Computing, 2015.

Digital Library

[16]

J. Neyman. On the application of probability theory to agricultural experiments. essay on principles. section 9 (1923). Statistical Science, 5(4):465--472, 1923 {1990}. reprint. Transl. by Dabrowska and Speed.

[17]

A. G. Nikolaev, S. H. Jacobson, W. K. T. Cho, J. J. Sauppe, and E. C. Sewell. Balance optimization subset selection (boss): An alternative approach for causal inference with observational data. Operations Research, 61:398--412, March/April 2013.

[18]

N. J. Radcliffe. Genetic set recombination. In L. D. Whitley, editor, Foundations of Genetic Algorithms 2. Morgan Kaufmann Publishers, San Mateo, CA, 1993.

[19]

L. M. Reinisch, S. A. Sanders, E. L. Mortensen, and D. B. Rubin. In utero exposure to phenobarbital and intelligence deficits in adult men. The Journal of the American Medical Association, 274:1518--1525, 1995.

[20]

P. R. Rosenbaum. Optimal matching for observational studies. Journal of the American Statistical Association, 84(408):1024--1032, 1989.

[21]

P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41--55, 1983.

[22]

D. B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688--701, 1974.

[23]

D. B. Rubin. Assignment to a treatment group on the basis of a covariate. Journal of Educational Statistics, 2(1):1--26, 1977.

[24]

D. B. Rubin. Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6(1):34--58, 1978.

[25]

D. B. Rubin. Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism. Biometrics, 4:1213--1234, 1991.

[26]

H. A. Witkin, S. A. Mednick, F. Schulsinger, E. Bakkestrom, K. O. Christiansen, D. R. Goodenough, K. Hirschhorn, C. Lundsteen, D. R. Owen, J. Philip, D. B. Rubin, and M. Stocking. Criminality in xyy and xxy men. Science, 193:547--555, 1976.

[27]

J. Zubizarreta. Using mixed integer programming for matching in an observational study of kidney failure after surgery. Journal of the American Statistical Association, 107:1360--1371, 2012.

Cited By

Sharma DWilly CBischoff J(2020)Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimizationComplex & Intelligent Systems10.1007/s40747-020-00169-w7:1(41-59)Online publication date: 2-Jul-2020
https://doi.org/10.1007/s40747-020-00169-w

Recommendations

Inference in multi-agent causal models

In this article, we demonstrate the usefulness of causal Bayesian networks as probabilistic reasoning systems. The biggest advantage of causal Bayesian networks over traditional probabilistic Bayesian networks is that they sometimes allow to perform ...
Multiple Causal Inference with Bayesian Factor Models
Causal inference and causal explanation with background knowledge
UAI'95: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

This paper presents correct algorithms for answering the following two questions; (i) Does there exist a causal explanation consistent with a set of background knowledge which explains all of the observed independence facts in a sample? (ii) Given that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

July 2016

405 pages

ISBN:9781450347556

DOI:10.1145/2949550

General Chair:
Kelly Gaither
Texas Advanced Computing Center

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGAPP: ACM Special Interest Group on Applied Computing
Xsede: Xsede
San Diego Supercomputer Center: San Diego Supercomputer Center
NICS: National Institute for Computational Sciences
University of Illinois: The University of Illinois at Urbana-Champaign

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Blue Waters

Conference

XSEDE16

XSEDE16: Diversity, Big Data, and Science at Scale

July 17 - 21, 2016

Miami, USA

Acceptance Rates

Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
101
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sharma DWilly CBischoff J(2020)Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimizationComplex & Intelligent Systems10.1007/s40747-020-00169-w7:1(41-59)Online publication date: 2-Jul-2020
https://doi.org/10.1007/s40747-020-00169-w

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents