[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2949550.2949568acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

A Parallel Evolutionary Algorithm for Subset Selection in Causal Inference Models

Published: 17 July 2016 Publication History

Abstract

Science is concerned with identifying causal inferences. To move beyond simple observed relationships and associational inferences, researchers may employ randomized experimental designs to isolate a treatment effect, which then permits causal inferences. When experiments are not practical, a researcher is relegated to analyzing observational data. To make causal inferences from observational data, one must adjust the data so that they resemble data that might have emerged from an experiment. Traditionally, this has occurred through statistical models identified as matching methods. We claim that matching methods are unnecessarily constraining and propose, instead, that the goal is better achieved via a subset selection procedure that is able to identify statistically indistinguishable treatment and control groups. This reformulation to identifying optimal subsets leads to a model that is computationally complex. We develop an evolutionary algorithm that is more efficient and identifies empirically more optimal solutions than any other causal inference method. To gain greater efficiency, we also develop a scalable algorithm for a parallel computing environment by enlisting additional processors to search a greater range of the solution space and to aid other processors at particularly difficult peaks.

References

[1]
W. K. T. Cho, J. J. Sauppe, A. G. Nikolaev, S. H. Jacobson, and E. C. Sewell. An optimization approach for making causal inferences. Statistica Neerlandica, 67(2):211--226, May 2013.
[2]
W. G. Cochran and G. Cox. Experimental Designs. Chapman & Hall, London, 1957.
[3]
T. D. Cook and D. T. Campbell. Quasi-Experimentation: Design & Analysis Issues for Field Settings. Houghton Mifflin, 1979.
[4]
T. D. Cook and M. R. Payne. Objecting to the objections to using random assignment in educational research. In Evidence Matters: Randomized Trials in Education Research, pages 150--178. Brookings Institution Press, 2002.
[5]
P. V. da Veiga and R. P. Wilder. Maternal smoking during pregnancy and birthweight: A propensity score matching approach. Maternal and Child Health Journal, 12(2):194--203, March 2008.
[6]
R. Dehejia and S. Wahba. Causal effects in non-experimental studies: Re-evaluating the evaluation of training programs. Journal of the American Statistical Association, 94(448):1053--1062, 1999.
[7]
A. Diamond and J. S. Sekhon. Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3):932--945, 2013.
[8]
E. Erez. Randomized experiments in correctional context: Legal, ethnical, and practical concerns. Journal of Criminal Justice, 14(5):389--400, 1986.
[9]
R. A. Fisher. Design of Experiments. Hafner, New York, 1935.
[10]
B. B. Hansen and J. Bowers. Covariate balance in simple, stratified and clustered comparative studies. Statistical Science, 23(2):219--236, 2008.
[11]
P. W. Holland. Statistics and causal inference. Journal of the American Statistical Association, 81(396):945--960, 1986.
[12]
K. Imai. Do get-out-the-vote calls reduce turnout? the importance of statistical methods for field experiments. American Political Science Review, 99(2):283--300, 2005.
[13]
S. Johnston, J. Rootenberg, S. Katrak, W. Smith, and J. Elkins. Effect of a u.s. national institutes of health programme of clinical trials on public health and costs. Lancet, 367(9519):1319--1327, April 2006.
[14]
R. LaLonde. Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76:604--20, September 1986.
[15]
Y. Y. Liu and S. Wang. A scalable parallel genetic algorithm for the generalized assignment problem. Parallel Computing, 2015.
[16]
J. Neyman. On the application of probability theory to agricultural experiments. essay on principles. section 9 (1923). Statistical Science, 5(4):465--472, 1923 {1990}. reprint. Transl. by Dabrowska and Speed.
[17]
A. G. Nikolaev, S. H. Jacobson, W. K. T. Cho, J. J. Sauppe, and E. C. Sewell. Balance optimization subset selection (boss): An alternative approach for causal inference with observational data. Operations Research, 61:398--412, March/April 2013.
[18]
N. J. Radcliffe. Genetic set recombination. In L. D. Whitley, editor, Foundations of Genetic Algorithms 2. Morgan Kaufmann Publishers, San Mateo, CA, 1993.
[19]
L. M. Reinisch, S. A. Sanders, E. L. Mortensen, and D. B. Rubin. In utero exposure to phenobarbital and intelligence deficits in adult men. The Journal of the American Medical Association, 274:1518--1525, 1995.
[20]
P. R. Rosenbaum. Optimal matching for observational studies. Journal of the American Statistical Association, 84(408):1024--1032, 1989.
[21]
P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41--55, 1983.
[22]
D. B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688--701, 1974.
[23]
D. B. Rubin. Assignment to a treatment group on the basis of a covariate. Journal of Educational Statistics, 2(1):1--26, 1977.
[24]
D. B. Rubin. Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6(1):34--58, 1978.
[25]
D. B. Rubin. Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism. Biometrics, 4:1213--1234, 1991.
[26]
H. A. Witkin, S. A. Mednick, F. Schulsinger, E. Bakkestrom, K. O. Christiansen, D. R. Goodenough, K. Hirschhorn, C. Lundsteen, D. R. Owen, J. Philip, D. B. Rubin, and M. Stocking. Criminality in xyy and xxy men. Science, 193:547--555, 1976.
[27]
J. Zubizarreta. Using mixed integer programming for matching in an observational study of kidney failure after surgery. Journal of the American Statistical Association, 107:1360--1371, 2012.

Cited By

View all
  • (2020)Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimizationComplex & Intelligent Systems10.1007/s40747-020-00169-w7:1(41-59)Online publication date: 2-Jul-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale
July 2016
405 pages
ISBN:9781450347556
DOI:10.1145/2949550
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Combinatorial Optimization
  2. Evolutionary Algorithm
  3. Message Passing
  4. Parallel Computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Blue Waters

Conference

XSEDE16

Acceptance Rates

Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimizationComplex & Intelligent Systems10.1007/s40747-020-00169-w7:1(41-59)Online publication date: 2-Jul-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media