[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v178y2023ics0167947322001785.html
   My bibliography  Save this article

Visualization and assessment of model selection uncertainty

Author

Listed:
  • Qin, Yichen
  • Wang, Linna
  • Li, Yang
  • Li, Rong
Abstract
Although model selection is ubiquitous in scientific discovery, the stability and uncertainty of the selected model is often hard to evaluate. How to characterize the random behavior of the model selection procedure is the key to understand and quantify the model selection uncertainty. To this goal, initially several graphical tools are proposed. These include the G-plots and H-plots, to visualize the distribution of the selected model. Then the concept of model selection deviation to quantify the model selection uncertainty is introduced. Similar to the standard error of an estimator, model selection deviation measures the stability of the selected model given by a model selection procedure. For such a measure, a bootstrap estimation procedure is discussed and its desirable performance is demonstrated through simulation studies and real data analysis.

Suggested Citation

  • Qin, Yichen & Wang, Linna & Li, Yang & Li, Rong, 2023. "Visualization and assessment of model selection uncertainty," Computational Statistics & Data Analysis, Elsevier, vol. 178(C).
  • Handle: RePEc:eee:csdana:v:178:y:2023:i:c:s0167947322001785
    DOI: 10.1016/j.csda.2022.107598
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322001785
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107598?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yang Li & Yuetian Luo & Davide Ferrari & Xiaonan Hu & Yichen Qin, 2019. "Rejoinder to Discussions on: Model confidence bounds for variable selection," Biometrics, The International Biometric Society, vol. 75(2), pages 411-413, June.
    2. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    3. Jie Ding & Vahid Tarokh & Yuhong Yang, 2018. "Model Selection Techniques -- An Overview," Papers 1810.09583, arXiv.org.
    4. Behrendt, Simon & Schweikert, Karsten, 2021. "A Note on Adaptive Group Lasso for Structural Break Time Series," Econometrics and Statistics, Elsevier, vol. 17(C), pages 156-172.
    5. Chai, Hao & Zhang, Qingzhao & Jiang, Yu & Wang, Guohua & Zhang, Sanguo & Ahmed, Syed Ejaz & Ma, Shuangge, 2017. "Identifying gene-environment interactions for prognosis using a robust approach," Econometrics and Statistics, Elsevier, vol. 4(C), pages 105-120.
    6. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    7. Fan, Zhaohu & Reimherr, Matthew, 2017. "High-dimensional adaptive function-on-scalar regression," Econometrics and Statistics, Elsevier, vol. 1(C), pages 167-183.
    8. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    9. Peter R. Hansen & Asger Lunde & James M. Nason, 2011. "The Model Confidence Set," Econometrica, Econometric Society, vol. 79(2), pages 453-497, March.
    10. Lan Wang & Jianhui Zhou & Annie Qu, 2012. "Penalized Generalized Estimating Equations for High-Dimensional Longitudinal Data Analysis," Biometrics, The International Biometric Society, vol. 68(2), pages 353-360, June.
    11. Chris Chatfield, 1995. "Model Uncertainty, Data Mining and Statistical Inference," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 158(3), pages 419-444, May.
    12. Pötscher, Benedikt M. & Leeb, Hannes, 2009. "On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2065-2082, October.
    13. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    14. Chao Zheng & Davide Ferrari & Michael Zhang & Paul Baird, 2019. "Ranking the importance of genetic factors by variable‐selection confidence sets," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 68(3), pages 727-749, April.
    15. Yang Li & Rong Li & Yichen Qin & Mengyun Wu & Shuangge Ma, 2019. "Integrative interaction analysis using threshold gradient directed regularization," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 35(2), pages 354-375, March.
    16. Mousavi, Seyed Nourollah & Sørensen, Helle, 2017. "Multinomial functional regression with wavelets and LASSO penalization," Econometrics and Statistics, Elsevier, vol. 1(C), pages 150-166.
    17. Wenjing Yang & Yuhong Yang, 2017. "Toward an objective and reproducible model choice via variable selection deviation," Biometrics, The International Biometric Society, vol. 73(1), pages 20-30, March.
    18. Chatterjee, A. & Lahiri, S. N., 2011. "Bootstrapping Lasso Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 608-625.
    19. Qing Zhou, 2014. "Monte Carlo Simulation for Lasso-Type Problems by Estimator Augmentation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1495-1516, December.
    20. Christian Hennig & Willi Sauerbrei, 2019. "Exploration of the variability of variable selection based on distances between bootstrap sample results," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 933-963, December.
    21. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    22. Chenglong Ye & Yi Yang & Yuhong Yang, 2018. "Sparsity Oriented Importance Learning for High-Dimensional Linear Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(524), pages 1797-1812, October.
    23. Yang Li & Yuetian Luo & Davide Ferrari & Xiaonan Hu & Yichen Qin, 2019. "Model confidence bounds for variable selection," Biometrics, The International Biometric Society, vol. 75(2), pages 392-403, June.
    24. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    25. Bayer, Sebastian, 2018. "Combining Value-at-Risk forecasts using penalized quantile regressions," Econometrics and Statistics, Elsevier, vol. 8(C), pages 56-77.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaorui Zhu & Yichen Qin & Peng Wang, 2023. "Sparsified Simultaneous Confidence Intervals for High-Dimensional Linear Models," Papers 2307.07574, arXiv.org.
    2. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    3. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    4. Hui Xiao & Yiguo Sun, 2019. "On Tuning Parameter Selection in Model Selection and Model Averaging: A Monte Carlo Study," JRFM, MDPI, vol. 12(3), pages 1-16, June.
    5. Xianyi Wu & Xian Zhou, 2019. "On Hodges’ superefficiency and merits of oracle property in model selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(5), pages 1093-1119, October.
    6. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    7. Faguang Wen & Jiming Jiang & Yihui Luan, 2024. "Model Selection Path and Construction of Model Confidence Set under High-Dimensional Variables," Mathematics, MDPI, vol. 12(5), pages 1-21, February.
    8. Yang, Yuan & McMahan, Christopher S. & Wang, Yu-Bo & Ouyang, Yuyuan, 2024. "Estimation of l0 norm penalized models: A statistical treatment," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    9. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    10. Chuliá, Helena & Garrón, Ignacio & Uribe, Jorge M., 2024. "Daily growth at risk: Financial or real drivers? The answer is not always the same," International Journal of Forecasting, Elsevier, vol. 40(2), pages 762-776.
    11. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    12. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    13. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    14. Zeyu Bian & Erica E. M. Moodie & Susan M. Shortreed & Sahir Bhatnagar, 2023. "Variable selection in regression‐based estimation of dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 79(2), pages 988-999, June.
    15. Li, Gaorong & Lian, Heng & Feng, Sanying & Zhu, Lixing, 2013. "Automatic variable selection for longitudinal generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 174-186.
    16. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    17. Marcelo C. Medeiros & Eduardo F. Mendes, 2015. "l1-Regularization of High-Dimensional Time-Series Models with Flexible Innovations," Textos para discussão 636, Department of Economics PUC-Rio (Brazil).
    18. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    19. Zakariya Yahya Algamal & Muhammad Hisyam Lee, 2019. "A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 753-771, September.
    20. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:178:y:2023:i:c:s0167947322001785. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.