[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2983323.2983779acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open access

CRISP: Consensus Regularized Selection based Prediction

Published: 24 October 2016 Publication History

Abstract

Integrating regularization methods with standard loss functions such as the least squares, hinge loss, etc., within a regression framework has become a popular choice for researchers to learn predictive models with lower variance and better generalization ability. Regularizers also aid in building interpretable models with high-dimensional data which makes them very appealing. It is observed that each regularizer is uniquely formulated in order to capture data-specific properties such as correlation, structured sparsity and temporal smoothness. The problem of obtaining a consensus among such diverse regularizers while learning a predictive model is extremely important in order to determine the optimal regularizer for the problem. The advantage of such an approach is that it preserves the simplicity of the final model learned by selecting a single candidate model which is not the case with ensemble methods as they use multiple candidate models for prediction. This is called the consensus regularization problem which has not received much attention in the literature due to the inherent difficulty associated with learning and selecting a model from an integrated regularization framework. To solve this problem, in this paper, we propose a method to generate a committee of non-convex regularized linear regression models, and use a consensus criterion to determine the optimal model for prediction. Each corresponding non-convex optimization problem in the committee is solved efficiently using the cyclic-coordinate descent algorithm with the generalized thresholding operator. Our Consensus RegularIzation Selection based Prediction (CRISP) model is evaluated on electronic health records (EHRs) obtained from a large hospital for the congestive heart failure readmission prediction problem. We also evaluate our model on high-dimensional synthetic datasets to assess its performance. The results indicate that CRISP outperforms several state-of-the-art methods such as additive, interactions-based and other competing non-convex regularized linear regression methods.

References

[1]
L. I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2):281--286, 2002.
[2]
D. Ruta and B. Gabrys. An overview of classifier fusion methods. Computing and Information systems, 7(1):1--10, 2000.
[3]
S. Tulyakov, S. Jaeger, V. Govindaraju and D. Doermann. Review of classifier combination methods. Machine Learning in Document Analysis and Recognition, 33(1):361--386, 2008.
[4]
N. Parikh, and S. P. Boyd. Proximal Algorithms. Foundations and Trends in optimization, 1(3):127--239, 2014.
[5]
L. Condat. A generic proximal algorithm for convex optimization and its application to total variation minimization. IEEE Signal Processing Letters, 21(8):985--989, 2014.
[6]
D. L. Donoho, and J. M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425--455, 1994.
[7]
D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Wavelet shrinkage: asymptopia?. Journal of the Royal Statistical Society. Series B (Methodological), 57(2):301--369, 1995.
[8]
Y. She. Thresholding-based iterative selection procedures for model selection and shrinkage. Electronic Journal of statistics, 3:384--415, 2009.
[9]
C. H. Zhang. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2):894--942, 2010.
[10]
G. Gasso, A. Rakotomamonjy, and S. Canu. Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Transactions on Signal Processing, 57(12):4686--4698, 2009.
[11]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267--288, 1996.%
[12]
M. R. Osborne, B. Presnell, and B. A. Turlach. On the lasso and its dual. Journal of Computational and Graphical statistics, 9(2):319--337, 2000.
[13]
J. Friedman, T. Hastie, H. Höfling, and R. Tibshirani. Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302--332, 2007.
[14]
E. Candes, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted $\ell_1$ minimization. Journal of Fourier analysis and applications, 14(5):877--905, 2008.
[15]
T. Zhang. Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research, 11:1081--1107, 2010.
[16]
Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 623--631. ACM, 2013.
[17]
Y. Lou, R. Caruana, and J. Gehrke. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 150--158. ACM, 2012.
[18]
R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. Intelligible models for healthCare: predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1721--1730. ACM, 2015.
[19]
J. Bien, J. Taylor, and R. Tibshirani. A lasso for hierarchical interactions. Annals of statistics, 41(3):1111--1141, 2013.
[20]
P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of statistics, 37(6):3468--3497, 2009.
[21]
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of statistics, 32(2):407--499, 2004.
[22]
J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348--1360, 2001.
[23]
J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1--22, 2010, http://www.jstatsoft.org/v33/i01/.
[24]
R. Mazumder, J. Friedman, and T. Hastie. SparseNet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 106 (495):1125--1138, 2012.
[25]
B. Vinzamuri, and C. K. Reddy. Cox regression with correlation based regularization for electronic health records. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, pages 757--766, 2013.
[26]
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301--320, 2005.
[27]
J. Friendman, T. Hastie, and R. Tibshirani. A note on the group lasso and the sparse group lasso. arXiv preprint arXiv:1001.0736, 2010.
[28]
R. Fan, K. Chang, C. Hsieh, R. Wang, and C. Lin. LIBLINEAR: A library for large linear classification. Journal of machine learning research, 9:1871--1874, 2008.
[29]
J. Mairal, F. R. Bach, and J. Ponce. Sparse Modeling for Image and Vision Processing. Foundations and Trends in Computer Graphics and Vision, 8(2):85--283, 2014.
[30]
H. Ben. Metrics: Evaluation metrics for machine learning. 2012, https://cran.r-project.org/web/packages/Metrics/.
[31]
P. Gong, C. Zhang, Z. Lu, J. Huang, and J. Ye. A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems. Proceedings of International Conference on Machine Learning (ICML), pages 37--45, 2013.
[32]
B. Vinzamuri, Y. Li, and C. K. Reddy. Active learning based survival regression for censored data. Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pages 241--250, 2014.

Cited By

View all
  • (2021)A Novel Tensor-Based Temporal Multi-Task Survival Analysis ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.296770033:9(3311-3322)Online publication date: 1-Sep-2021
  • (2018)FALSE DISCOVERY RATE CONTROL WITH CONCAVE PENALTIES USING STABILITY SELECTION2018 IEEE Data Science Workshop (DSW)10.1109/DSW.2018.8439910(76-80)Online publication date: Jun-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. consensus prediction
  2. regression.
  3. regularization

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)6
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A Novel Tensor-Based Temporal Multi-Task Survival Analysis ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.296770033:9(3311-3322)Online publication date: 1-Sep-2021
  • (2018)FALSE DISCOVERY RATE CONTROL WITH CONCAVE PENALTIES USING STABILITY SELECTION2018 IEEE Data Science Workshop (DSW)10.1109/DSW.2018.8439910(76-80)Online publication date: Jun-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media