Abstract
One of the most fascinating areas of study in the current economic and financial world is the forecasting of credit risk and the ability to predict a company’s insolvency. Meanwhile, one major challenge in constructing predictive failure models is variable selection. Standard selection methods exist alongside new approaches. In addition, the huge availability of data often implies limitations due to processing time and new high-performance procedures provide tools that can take advantage of parallel processing. In the present paper, different variable selection techniques were explored in the context of applying logistic regression for binary data to a balanced data set including only firms active or in bankruptcy. Models deriving from stepwise selection, the Least Absolute Shrinkage and Selection Operator (LASSO) and an unsupervised method, based on the maximum data variance explained, were compared. Then a non-parametric approach was considered and the selection of variables coming from a single decision tree and a forest of trees is compared and discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Finance 23(4), 589–609 (1968)
Amendola, A., Restaino, M., Sensini, L.: Variable selection in default risk models. J. Risk Model Validation 5(1), 3 (2011)
Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J. Clin. Epidemiol. 57(11), 1138–1146 (2004)
Banasik, J., Crook, J.N., Thomas, L.C.: Not if but when will borrowers default. J. Oper. Res. Soc. 50(12), 1185–1190 (1999)
Beaver, W.H.: Financial ratios as predictors of failure. Journal of Account. Res. 4, 71–111 (1966)
Bonini, S., Caivano, G.: The survival analysis approach in Basel II credit risk management: modeling danger rates in the loss given default parameter. J. Credit Risk 9(1), 101–118 (2013)
Bunea, F.: Honest variable selection in linear and logistic regression models via \(\ell \)1 and \(\ell \)1+ \(\ell \)2 penalization. Electron. J. Stat. 2, 1153–1194 (2008)
Bursac, Z., Gauss, C.H., Williams, D.K., Hosmer, D.W.: Purposeful selection of variables in logistic regression. Source Code Biol. Med. 3(1), 1–8 (2008)
Cao, R., Vilar, J.M., Devia, A., Veraverbeke, N., Boucher, J.P., Beran, J.: Modelling consumer credit risk via survival analysis. SORT Stat. Oper. Res. Trans. 33(1), 31–47 (2009)
Caroni, C., Pierri, F.: Different causes of closure of small business enterprises: alternative models for competing risks survival analysis. Electron. J. Appl. Stat. Anal. 13(1), 211–228 (2020)
Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inform. 35(5–6), 352–359 (2002)
Du Jardin, P.: Predicting bankruptcy using neural networks and other classification methods: the influence of variable selection techniques on model accuracy. Neurocomputing 73(10), 2047–2060 (2010). https://doi.org/10.1016/j.neucom.2009.11.034, https://www.sciencedirect.com/science/article/pii/S0925231210001098, subspace Learning/Selected papers from the European Symposium on Time Series Prediction
Du Jardin, P.: The influence of variable selection methods on the accuracy of bankruptcy prediction models. Bank. Mark. Invest. 116, 20–39 (2012)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Fan, J., Li, R.: Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 30(1), 74–99 (2002)
Fan, J., Li, G., Li, R.: An overview on variable selection for survival analysis. In: Contemporary Multivariate Analysis and Design of Experiments: In Celebration of Professor Kai-Tai Fang’s 65th Birthday, pp. 315–336 (2005)
Fu, Z., Parikh, C.R., Zhou, B.: Penalized variable selection in competing risks regression. Lifetime Data Anal. 23(3), 353–376 (2017)
Ghosh, K., Ramteke, M., Srinivasan, R.: Optimal variable selection for effective statistical process monitoring. Comput. Chem. Eng. 60, 260–276 (2014)
He, Z., Tu, W., Wang, S., Fu, H., Yu, Z.: Simultaneous variable selection for joint models of longitudinal and survival outcomes. Biometrics 71(1), 178–187 (2015)
Kiefer, N.M.: Economic duration data and hazard functions. J. Econ. Literature 26(2), 646–679 (1988)
Kim, J., Sohn, I., Jung, S.H., Kim, S., Park, C.: Analysis of survival data with group lasso. Commun. Stat. Simul. Comput. 41(9), 1593–1605 (2012)
King, G., Zeng, L.: Logistic regression in rare events data. Political Anal. 9(2), 137–163 (2001)
Kumar, A., Rao, V.R., Soni, H.: An empirical comparison of neural network and logistic regression models. Mark. Lett. 6(4), 251–263 (1995)
Kundu, S., Mazumdar, M., Ferket, B.: Impact of correlation of predictors on discrimination of risk models in development and external populations. BMC Med. Res. Methodol. 17(1), 1–9 (2017)
Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)
Mossman, C.E., Bell, G.G., Swartz, L.M., Turtle, H.: An empirical comparison of bankruptcy models. Financial Rev. 33(2), 35–54 (1998)
Narain, B.: Survival analysis and the credit granting decision. In: Thomas, L.C., Crook, J.N., Edelman, D.B. (eds.), Credit Scoring and Credit Control, pp. 109–122. Oxford Univeristy Press (1992)
Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res. 18(1), 109–131 (1980)
Orbis: Orbis. Bureau van Dijk, https://orbis.bvdinfo.com/. Accessed June 2020
Pierri, F., Caroni, C.: Bankruptcy prediction by survival models based on current and lagged values of time-varying financial data. Commun. Stat. Case Stud. Data Anal. Appl. 3(3–4), 62–70 (2017)
Pierri, F., Caroni, C.: Analysing the risk of bankruptcy of firms: survival analysis, competing risks and multistate models. In: Demography of Population Health, Aging and Health Expenditures, pp. 385–394. Springer (2020)
Pierri, F., Stanghellini, E., Bistoni, N.: Risk analysis and retrospective unbalanced data. Revstat-Stat. J. 14(2), 157–169 (2016)
SAS: SAS/STAT® 9.22 User’s Guide. https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#logistic_toc.htm. Accessed 19 Nov 2022
SAS: SAS/STAT® 9.22 User’s Guide. https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#glmselect_toc.htm. Accessed 19 Nov 2022
SAS: SAS/STAT® 9.22 User’s Guide. https://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_hpsplit_overview.htm. Accessed 19 Nov 2022
SAS: SAS® Enterprise MinerTM: High-Performance Procedures. https://documentation.sas.com/doc/en/emhpprcref/14.2/emhpprcref_hpreduce_details01.htm (2016). Accessed 19 Nov 2022
SAS Institute Inc., Cary, NC: SAS® Enterprise MinerTM 15.2: High-Performance Procedures, last updated: August 18, 2022
Shumway, T.: Forecasting bankruptcy more accurately: a simple hazard model. J. Bus. 74(1), 101–124 (2001)
Stepanova, M., Thomas, L.: Survival analysis methods for personal loan data. Oper. Res. 50(2), 277–289 (2002)
Sun, K., Huang, S.H., Wong, D.S.H., Jang, S.S.: Design and application of a variable selection method for multilayer perceptron neural network with lasso. IEEE Trans. Neural Netw. Learn. Syst. 28(6), 1386–1396 (2016)
Tang, Z., Shen, Y., Zhang, X., Yi, N.: The spike-and-slab lasso Cox model for survival prediction and associated genes detection. Bioinformatics 33(18), 2799–2807 (2017)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16(4), 385–395 (1997)
Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(3), 273–282 (2011)
Zellner, D., Keller, F., Zellner, G.E.: Variable selection in logistic regression models. Commun. Stat. Simul. Comput. 33(3), 787–805 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pierri, F. (2023). Variable Selection in Binary Logistic Regression for Modelling Bankruptcy Risk. In: Kitsos, C.P., Oliveira, T.A., Pierri, F., Restaino, M. (eds) Statistical Modelling and Risk Analysis. ICRA 2022. Springer Proceedings in Mathematics & Statistics, vol 430. Springer, Cham. https://doi.org/10.1007/978-3-031-39864-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-39864-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39863-6
Online ISBN: 978-3-031-39864-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)