Does Machine Learning Offer Added Value Vis-à-Vis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE)
<p>Schematic summary of the literature review.</p> "> Figure 2
<p>AUC comparison of simulated Cox regressions and random forests (N = 500) when <span class="html-italic">t</span> = 1: (<b>a</b>) AUC comparison for <span class="html-italic">t</span> = 1, <span class="html-italic">u</span> = 2; (<b>b</b>) AUC comparison for <span class="html-italic">t</span> = 1, <span class="html-italic">u</span> = 3; (<b>c</b>) AUC comparison for <span class="html-italic">t</span> = 1, <span class="html-italic">u</span> = 4; (<b>d</b>) AUC comparison for <span class="html-italic">t</span> = 1, <span class="html-italic">u</span> = 5; (<b>e</b>) AUC comparison for <span class="html-italic">t</span> = 1, <span class="html-italic">u</span> = 6.</p> "> Figure 3
<p>AUC comparison of simulated Cox regressions and random forests (N = 500) when <span class="html-italic">t</span> = 2: (<b>a</b>) AUC comparison for <span class="html-italic">t</span> = 2, <span class="html-italic">u</span> = 3; (<b>b</b>) AUC comparison for <span class="html-italic">t</span> = 2, <span class="html-italic">u</span> = 4; (<b>c</b>) AUC comparison for <span class="html-italic">t</span> = 2, <span class="html-italic">u</span> = 5; (<b>d</b>) AUC comparison for <span class="html-italic">t</span> = 2, <span class="html-italic">u</span> = 6.</p> "> Figure 4
<p>AUC comparison of simulated Cox regressions and random forests (N = 500) when <span class="html-italic">t</span> = 3: (<b>a</b>) AUC comparison for <span class="html-italic">t</span> = 3, <span class="html-italic">u</span> = 4; (<b>b</b>) AUC comparison for <span class="html-italic">t</span> = 3, <span class="html-italic">u</span> = 5; (<b>c</b>); AUC comparison for <span class="html-italic">t</span> = 3, <span class="html-italic">u</span> = 6.</p> "> Figure 5
<p>AUC comparison of simulated Cox regressions and random forests (N = 500) when <span class="html-italic">t</span> = 4: (<b>a</b>) AUC comparison for <span class="html-italic">t</span> = 4, <span class="html-italic">u</span> = 5; (<b>b</b>) AUC comparison for <span class="html-italic">t</span> = 4, <span class="html-italic">u</span> = 6.</p> "> Figure 6
<p>AUC comparison of simulated Cox regressions and random forests (N = 500) when <span class="html-italic">t</span> = 5 and <span class="html-italic">u</span> = 6.</p> ">
Abstract
:1. Introduction
- (a)
- Theoretically, it attempts to shed new light on the retirement puzzle by addressing the interplay of newly identified factors influencing the decision to retire (cf. supra);
- (b)
- Methodologically, it intends to explore a new technique for the study of large databases, and to compare it with the performance of existing techniques;
- (c)
- Empirically, it attempts to probe the actual impact of the newly identified variables in a longitudinal setting.
2. State of the Art
2.1. Retirement
2.2. Machine Learning
2.3. Machine Learning and EHA
2.4. Random Forests and Retirement Data
3. Materials and Methods
3.1. The Data
3.2. Cox Regressions
3.3. Machine Learning Algorithms
4. Results
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Variable Name | Description | Variable Type | Values | Time Dimension |
---|---|---|---|---|
country | Country where the respondent resides | Nominal | Austria | None |
Germany | ||||
Sweden | ||||
Netherlands | ||||
Spain | ||||
Italy | ||||
France | ||||
Denmark | ||||
Switzerland | ||||
Belgium | ||||
age | Respondent’s age at the beginning of the measurements (2004) | Continuous | 38–82 | Not included in survey |
gender | Respondent’s gender | Binary | 1 = Male | None |
2 = Female | ||||
sphus | Self-perceived health status | Ordinal | 1 = Poor | Yes |
2 = Fair | ||||
3 = Good | ||||
4 = Very good | ||||
5 = Excellent | ||||
grchild_bin | Presence of grandchildren, constructed on the basis of ngrchild from the original survey data | Binary | 0 = No grandchildren | Yes |
1 = One or more grandchildren | ||||
rent_or_mortgage | Constructed on the basis of the “Owner, tenant or rent free” (otrf) and “Mortgage” (mort) variables from the original SHARE data. | Binary | 0 = Respondent pays no rent or mortgage | Yes |
1 = Respondent pays rent or mortgage | ||||
If mort > 0 or otrf is (3,4), code becomes 1. | ||||
yedu | Number of years in education in 2004 | Continuous | 0–21 | No |
References
- Fisher, G.G.; Chaffee, D.S.; Sonnega, A. Retirement Timing: A Review and Recommendations for Future Research. Work. Aging Retire. 2016, 2, 230–261. [Google Scholar] [CrossRef]
- Scharn, M.; Sewdas, R.; Boot, C.R.L.; Huisman, M.; Lindeboom, M.; Van Der Beek, A.J. Domains and Determinants of Retirement Timing: A Systematic Review of Longitudinal Studies. BMC Public Health 2018, 18, 1083. [Google Scholar] [CrossRef] [PubMed]
- Varian, H.R. Big Data: New Tricks for Econometrics. J. Econ. Perspect. 2014, 28, 3–28. [Google Scholar] [CrossRef] [Green Version]
- Athey, S. Machine Learning and Causal Inference for Policy Evaluation. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD’15, Sydney, Australia, 10–13 August 2015; pp. 5–6. [Google Scholar] [CrossRef]
- Athey, S.; Imbens, G.W. Machine Learning Methods That Economists Should Know About. Annu. Rev. Econ. 2019, 11, 685–725. [Google Scholar] [CrossRef]
- Hindman, M. Building Better Models: Prediction, Replication, and Machine Learning in the Social Sciences. Ann. Am. Acad. Political Soc. Sci. 2015, 659, 48–62. [Google Scholar] [CrossRef]
- Boelaert, J.; Ollion, E. The Great Regression Machine Learning, Econometrics, and the Future of Quantitative Social Sciences. Rev. Fr. Sociol. 2018, 59, 475–506. [Google Scholar] [CrossRef]
- MacLeod, W.B. Viewpoint: The Human Capital Approach to Inference. Can. J. Econ. 2017, 50, 5–39. [Google Scholar] [CrossRef] [Green Version]
- Kim, D.; Shin, S. The Economic Explainability of Machine Learning and Standard Econometric Models—An Application to the US Mortgage Risk. Int. J. Strateg. Prop. Manag. 2021, 25, 396–412. [Google Scholar] [CrossRef]
- Hansen, J.V.; Nelson, R.D. Forecasting and Recombining Time-Series Components by Using Neural Networks. J. Oper. Res. Soc. 2003, 54, 307–317. [Google Scholar] [CrossRef]
- Cerniglia, J.A.; Fabozzi, F.T. Selecting Computational Models for Asset Management: Financial Econometrics versus Machine Learning-Is There a Conflict? J. Portf. Manag. 2020, 47, 107–118. [Google Scholar] [CrossRef]
- Sofianos, E.; Gogas, P.; Papadimitriou, T. Mind the Gap: Forecasting Euro-Area Output Gaps with Machine Learning. Appl. Econ. Lett. 2021, 1–5. [Google Scholar] [CrossRef]
- Liu, Y.; Xie, T. Machine Learning versus Econometrics: Prediction of Box Office. Appl. Econ. Lett. 2019, 26, 124–130. [Google Scholar] [CrossRef]
- Gogas, P.; Papadimitriou, T.; Sofianos, E. Forecasting Unemployment in the Euro Area with Machine Learning. J. Forecast. 2021. [Google Scholar] [CrossRef]
- Plakandaras, V.; Papadimitriou, T.; Gogas, P. Forecasting Transportation Demand for the US Market. Transp. Res. Part A Policy Pract. 2019, 126, 195–214. [Google Scholar] [CrossRef]
- Ramsey, S.M.; Bergtold, J.S. Examining Inferences from Neural Network Estimators of Binary Choice Processes: Marginal Effects, and Willingness-to-Pay. Comput. Econ. 2021, 58, 1137–1165. [Google Scholar] [CrossRef]
- Steinkraus, A. Estimating Treatment Effects With Artificial Neural Nets—A Comparison to Synthetic Control Method. Econ. Bull. 2019, 39, 2778–2791. [Google Scholar]
- Arora, P.; Boyne, D.; Slater, J.J.; Gupta, A.; Brenner, D.R.; Druzdzel, M.J. Bayesian Networks for Risk Prediction Using Real-World Data: A Tool for Precision Medicine. Value Health 2019, 22, 439–445. [Google Scholar] [CrossRef] [Green Version]
- Hansen, J.V.; McDonald, J.B.; Stice, J.D. Artificial Intelligence and Generalized Qualitative Response Models—An Empirical Test on 2 Audit Decision-Making Domains. Decis. Sci. 1992, 23, 708–723. [Google Scholar] [CrossRef]
- Malhotra, A. A Hybrid Econometric-Machine Learning Approach for Relative Importance Analysis: Prioritizing Food Policy. Eurasian Econ. Rev. 2021, 11, 549–581. [Google Scholar] [CrossRef]
- Chakrabarti, P.; Frye, M. A Mixed-Methods Framework for Analyzing Text Data: Integrating Computational Techniques with Qualitative Methods in Demography. Demogr. Res. 2017, 37, 1351–1382. [Google Scholar] [CrossRef] [Green Version]
- Vijayakumar, R.; Cheung, M.W.-L. Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. Soc. Sci. Comput. Rev. 2021, 39, 768–801. [Google Scholar] [CrossRef]
- Sohnesen, T.P.; Stender, N. Is Random Forest a Superior Methodology for Predicting Poverty? An Empirical Assessment. Poverty Public Policy 2017, 9, 118–133. [Google Scholar] [CrossRef] [Green Version]
- Gogas, P.; Papadimitriou, T. Machine Learning in Economics and Finance. Comput. Econ. 2021, 57, 1–4. [Google Scholar] [CrossRef]
- Börsch-Supan, A.; Brandt, M.; Hunkler, C.; Kneip, T.; Korbmacher, J.; Malter, F.; Schaan, B.; Stuck, S.; Zuber, S. SHARE Central Coordination Team Data Resource Profile: The Survey of Health, Ageing and Retirement in Europe (SHARE). Int. J. Epidemiol. 2013, 42, 992–1001. [Google Scholar] [CrossRef] [PubMed]
- Börsch-Supan, A. Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 1; Release Version: 7.1.0; SHARE-ERIC: Munich, Germany, 2020. [Google Scholar]
- Börsch-Supan, A. Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 4; Release Version: 7.1.0; SHARE-ERIC: Munich, Germany, 2020. [Google Scholar]
- Börsch-Supan, A. Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 5; Release Version: 7.1.0; SHARE-ERIC: Munich, Germany, 2020. [Google Scholar]
- Börsch-Supan, A. Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 6; Release Version: 7.1.0; SHARE-ERIC: Munich, Germany, 2020. [Google Scholar]
- Börsch-Supan, A. Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 7; Release Version: 7.1.0; SHARE-ERIC: Munich, Germany, 2020. [Google Scholar]
- Börsch-Supan, A. Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 2; Release Version: 7.1.0; SHARE-ERIC: Munich, Germany, 2020. [Google Scholar]
- Ishwaran, H.; Kogalur, U.B. Package ‘RandomForestSRC’; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
- OECD. The Transition from Work to Retirement; Social Policy Studies: Paris, France, 1995. [Google Scholar]
- Dorn, D.; Sousa-Poza, A. The Determinants of Early Retirement in Switzerland. Swiss J. Econ. Stat. 2005, 141, 247–283. [Google Scholar]
- Wang, M.; Wanberg, C.R. 100 Years of Applied Psychology Research on Individual Careers: From Career Management to Retirement. J. Appl. Psychol. 2017, 102, 546–563. [Google Scholar] [CrossRef]
- Heckman, J.J.; Vytlacil, E. Structural Equations, Treatment Effects and Econometric Policy Evaluation; NBER Working Papers Series; NBER: Cambridge, MA, USA, 2005. [Google Scholar]
- Gustman, A.L.; Steinmeier, T.L. How Changes in Social Security Affect Recent Retirement Trends; NBER Working Papers Series; NBER: Cambridge, MA, USA, 2008. [Google Scholar]
- Berkovec, J.; Stern, S. Job Exit Behavior of Older Men. Econometrica 1991, 59, 189–210. [Google Scholar] [CrossRef]
- Gustman, A.L.; Steinmeier, T.L. A Structural Retirement Model; NBER Working Papers Series; NBER: Cambridge, MA, USA, 1983. [Google Scholar]
- Gustman, A.L.; Steinmeier, T.L. Retirement in a Family Context: A Structural Model for Husbands and Wives; NBER Working Papers Series; NBER: Cambridge, MA, USA, 1994. [Google Scholar]
- Gustman, A.L.; Steinmeier, T.L. Retirement in Dual-Career Families: A Structural Model. J. Labor Econ. 2000, 18, 503–545. [Google Scholar] [CrossRef]
- Gustman, A.L.; Steinmeier, T.L. The Social Security Early Entitlement Age in a Structural Model of Retirement and Wealth; NBER Working Papers; NBER: Cambridge, MA, USA, 2002. [Google Scholar]
- Casanova, M. Happy Together: A Structural Model of Couples’ Joint Retirement Choices. Available online: http://www.econ.ucla.edu/casanova/Files/Casanova_joint_ret.pdf (accessed on 29 July 2019).
- Rust, J.; Phelan, C. How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Markets. Econometrica 1997, 65, 781–831. [Google Scholar] [CrossRef] [Green Version]
- Laun, T.; Markussen, S.; Christian, T.; Wallenius, J. Journal of Economic Dynamics & Control Health, Longevity and Retirement Reform. J. Econ. Dyn. Control. 2019, 103, 123–157. [Google Scholar] [CrossRef]
- Dahl, S.-A.; Nilsen, O.A.; Vaage, K. Work or Retirement? Exit Routs for Norwegian Elderly; IZA Discussion Papers: Bonn, Germany, 1999. [Google Scholar]
- Hospido, L.; Zamarro, G. Retirement Patterns of Couples in Europe. IZA J. Eur. Labor Stud. 2014, 3, 12. [Google Scholar] [CrossRef] [Green Version]
- Sirven, N.; Barnay, T. Expectations, Loss Aversion and Retirement Decisions in the Context of the 2009 Crisis in Europe. Int. J. Manpow. 2017, 38, 25–44. [Google Scholar] [CrossRef]
- Manoli, D.; Mullen, K.J.; Wagner, M. Policy Variation, Labor Supply -Elasticities, and a Structural Model of Retirement. Econ. Inq. 2015, 53, 1702–1717. [Google Scholar] [CrossRef] [Green Version]
- Manoli, D.S.; Weber, A. Nonparametric Evidence on the Effects of Financial Incentives on Retirement Decisions; NBER Working Papers Series; NBER: Cambridge, MA, USA, 2011. [Google Scholar]
- Asch, B.; Haider, S.J.; Zissimopoulos, J. Financial Incentives and Retirement: Evidence from Federal Civil Service Workers. J. Public Econ. 2005, 89, 427–440. [Google Scholar] [CrossRef] [Green Version]
- Stock, B.Y.J.H.; Wise, D.A. Pensions, the Option Value of Work, and Retirement. Econometrica 1990, 58, 1151–1180. [Google Scholar] [CrossRef]
- Lumsdaine, R.L.; Stock, J.H.; Wise, D.A. Three Models of Retirement: Computational Complexity versus Predictive Validity. In Topics in the Economics of Aging; Wise, D.A., Ed.; University of Chicago Press: Chicago, IL, USA, 1992; pp. 21–60. [Google Scholar]
- Gruber, J.; Wise, D.A. Social Security Programs and Retirement Aroudn the World: Micro Estimation; NBER Working Papers Series; NBER: Cambridge, MA, USA, 2002. [Google Scholar]
- van Sonsbeek, J.-M. Micro Simulations on the Effects of Ageing-Related Policy Measures: The Social Affairs Department of the Netherlands Ageing and Pensions Model. Int. J. Microsimul. 2011, 4, 72–99. [Google Scholar] [CrossRef]
- Mazzaferro, C.; Morciano, M. CAPP DYN: A Dynamic Micro-Simulation Model for the Italian Social Security System; CAPPaper: Singapore, 2012. [Google Scholar]
- Hanappi, T.; Hofer, H.; Müllbacher, S.; Winter-Ebmer, R. IREA. IHS Microsimulation Model for Retirement Behaviour in Austria; Final Report; Institute for Advanced Studies: Vienna, Austria, 2012. [Google Scholar]
- Boersch-Supan, A.H. Incentive Effects of Social Security under an Uncertain Disability Option. In Themes in the Economics of Aging; Wise, D.A., Ed.; University of Chicago Press: Chicago, IL, USA, 2001; pp. 281–310. [Google Scholar]
- Belloni, M.; Alessie, R. Retirement Choices in Italy: What an Option Value Model Tells Us. Oxf. Bull. Econ. Stat. 2013, 75, 499–527. [Google Scholar] [CrossRef] [Green Version]
- Samwick, A.; Wise, D.A. Option Value Estimation with Health and Retirement Study Data. In Labor Markets and Firm Benefit Policies in Japan and the United States; Ogura, S., Tachibanaki, T., Wise, D.A., Eds.; University of Chicago Press: Chicago, IL, USA, 2003; pp. 205–228. [Google Scholar]
- Samwick, A.A. New Evidence on Pensions, Social Security, and the Timing of Retirement; NBER Working Papers Series; NBER: Cambridge, MA, USA, 1998. [Google Scholar]
- Berkel, B.; Boersch-Supan, A. Pension Reform in Germany: The Impact on Retirement Decisions; NBER Working Papers Series; NBER: Cambridge, MA, USA, 2003. [Google Scholar]
- Topa, G.; Depolo, M.; Alcover, C.M. Early Retirement: A Meta-Analysis of Its Antecedent and Subsequent Correlates. Front. Psychol. 2018, 8, 2157. [Google Scholar] [CrossRef]
- Feldman, D.C. The Decision to Retire Early: A Review and Conceptualization. Acad. Manag. Rev. 1994, 19, 285–311. [Google Scholar] [CrossRef]
- Kim, M.; Beehr, T.A. Retirement from Three Perspectives: Individuals, Organizations, and Society. In Aging and Work in the 21st Century; Shultz, K.S., Adams, G.A., Eds.; Routledge: New York, NY, USA, 2019; pp. 273–291. [Google Scholar]
- De Preter, H.; Van Looy, D.; Mortelmans, D. Individual and Institutional Push and Pull Factors as Predictors of Retirement Timing in Europe: A Multilevel Analysis. J. Aging Stud. 2013, 27, 299–307. [Google Scholar] [CrossRef]
- Topa, G.; Antonio, J.; Depolo, M.; Alcover, C.; Morales, J.F. Antecedents and Consequences of Retirement Planning and Decision-Making: A Meta-Analysis and Model. J. Vocat. Behav. 2009, 75, 38–55. [Google Scholar] [CrossRef]
- Sundstrup, E.; Thorsen, S.V.; Rugulies, R.; Larsen, M.; Thomassen, K.; Andersen, L.L. Importance of the Working Environment for Early Retirement: Prospective Cohort Study with Register Follow-Up. Int. J. Environ. Res. Public Health 2021, 18, 9817. [Google Scholar] [CrossRef]
- Boissonneault, M.; Mulders, J.O.; Turek, K.; Carriere, Y. A Systematic Review of Causes of Recent Increases in Ages of Labor Market Exit in OECD Countries. PLoS ONE 2020, 15, e0231897. [Google Scholar] [CrossRef]
- Boot, C.R.L.; Scharn, M.; Van Der Beek, A.J.; Andersen, L.L.; Elbers, C.T.M.; Lindeboom, M. Effects of Early Retirement Policy Changes on Working until Retirement: Natural Experiment. Int. J. Environ. Res. Public Health 2019, 16, 3895. [Google Scholar] [CrossRef] [Green Version]
- Fischer, J.A.V.; Sousa-Poza, A. The Institutional Determinants of Early Retirement in Europe; Discussion Papers; Department of Economics, University of St. Gallen: St. Gallen, Switzerland, 2006. [Google Scholar]
- Van Den Berg, T.; Schuring, M.; Avendano, M.; Mackenbach, J.; Burdorf, A. The Impact of Ill Health on Exit from Paid Employment in Europe among Older Workers. Occup. Environ. Med. 2010, 67, 845–852. [Google Scholar] [CrossRef]
- Fleischmann, M. Should I Stay or Should I Go? In A Workplace Perspective on Olderp Ersons’ Labour Market Participation; Erasmus Universiteit Rotterdam: Rotterdam, The Netherlands, 2014. [Google Scholar]
- de Wind, A.; Geuskens, G.A.; Ybema, J.F.; Bongers, P.M.; Van Der Beek, A.J. The Role of Ability, Motivation, and Opportunity to Work in the Transition from Work to Early Retirement—Testing and Optimizing the Early Retirement Model. Scand. J. Work. Environ. Health 2015, 41, 24–35. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Van Der Zwaan, G.L.; Hengel, K.M.O.; Sewdas, R.; De Wind, A.; Steenbeek, R. The Role of Personal Characteristics, Work Environment and Context in Working beyond Retirement: A Mixed-Methods Study. Int. Arch. Occup. Environ. Health 2019, 92, 535–549. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Böckerman, P.; Ilmakunnas, P. The Journal of the Economics of Ageing Do Good Working Conditions Make You Work Longer ? Analyzing Retirement Decisions Using Linked Survey and Register Data. J. Econ. Ageing 2020, 17, 100192. [Google Scholar] [CrossRef]
- Trentini, M. Retirement Timing in Italy: Rising Age and the Advantages of a Stable Working Career. Ageing Soc. 2021, 41, 1878–1896. [Google Scholar] [CrossRef]
- Mäcken, J.; Präg, P.; Hess, M. Educational Inequalities in Labor Market Exit of Older Workers in 15 European Countries. J. Soc. Policy 2021, 1–25. [Google Scholar] [CrossRef]
- Hagan, R.; Jones, A.M.; Rice, N. Health and Retirement in Europe. Int. J. Environ. Res. Public Health 2009, 6, 2676–2695. [Google Scholar] [CrossRef] [Green Version]
- van der Mark-Reeuwijk, K.G. Determinants of Exit from Paid Employment. Ph.D. Thesis, Erasmus University Rotterdam, Rotterdam, The Netherland, 2016. [Google Scholar]
- Van Bavel, J.; De Winter, T. Becoming a Grandparent and Early Retirement in Europe. Eur. Sociol. Rev. 2013, 29, 1295–1308. [Google Scholar] [CrossRef] [Green Version]
- De Breij, S.; Huisman, M.; Deeg, D.J.H. Educational Differences in Macro—Level Determinants of Early Exit from Paid Work: A Multilevel Analysis of 14 European Countries. Eur. J. Ageing 2020, 17, 217–227. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Radl, J.; Himmelreicher, R.K. The Influence of Marital Status and Spousal Employment on Retirement Behavior in Germany and Spain. Res. Aging 2015, 37, 361–387. [Google Scholar] [CrossRef]
- Bertogg, A.; Strau, S.; Vandecasteele, L. Advances in Life Course Research Linked Lives, Linked Retirement ? Relative Income Differences within Couples and Gendered Retirement Decisions in Europe. Adv. Life Course Res. 2021, 47, 100380. [Google Scholar] [CrossRef]
- Hoven, H.; Dragano, N.; Blane, D.; Wahrendorf, M. Early Adversity and Late Life Employment History—A Sequence Analysis Based on SHARE. Work. Aging Retire. 2018, 4, 238–250. [Google Scholar] [CrossRef] [Green Version]
- Radl, J. Labour Market Exit and Social Stratification in Western Europe: The Effects of Social Class and Gender on the Timing of Retirement. Eur. Sociol. Rev. 2013, 29, 654–668. [Google Scholar] [CrossRef]
- Hofäcker, D.; Schröder, H.; Li, Y.; Flynn, M. Trends and Determinants of Work-Retirement Transitions under Changing Institutional Conditions: Germany, England and Japan Compared. J. Soc. Policy 2015, 45, 39–64. [Google Scholar] [CrossRef] [Green Version]
- Einav, L.; Levin, J. Economics in the Age of Big Data. Science 2014, 346, 6210. [Google Scholar] [CrossRef] [PubMed]
- Kitchin, R. The Opportunities, Challenges and Risks of Big Data for Official Statistics. Stat. J. IAOS 2015, 31, 471–481. [Google Scholar] [CrossRef] [Green Version]
- Witten, I.H.; Frank, E. Data Mining:Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan Kaufmann Publishers: San Francisco, CA, USA, 2005; ISBN 0080890369. [Google Scholar]
- Hassani, H.; Saporta, G.; Silva, E.S. Data Mining and Official Statistics: The Past, the Present and the Future. Big Data 2014, 2, 34–43. [Google Scholar] [CrossRef] [Green Version]
- Athey, S.; Imbens, G. The State of Applied Econometrics—Causality and Policy Evaluation. J. Econ. Perspect. 2016, 31, 3–32. [Google Scholar] [CrossRef] [Green Version]
- Mullainathan, S.; Spiess, J. Machine Learning: An Applied Econometric Approach. J. Econ. Perspect. 2017, 31, 87–106. [Google Scholar] [CrossRef] [Green Version]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
- Seligman, B.; Tuljapurkar, S.; Rehkopf, D. Machine Learning Approaches to the Social Determinants of Health in the Health and Retirement Study. SSM-Popul. Health 2018, 4, 95–99. [Google Scholar] [CrossRef]
- Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining; Morgan Kaufmann: Burlington, MA, USA, 2017; ISBN 978-0-12-804291-5. [Google Scholar]
- Wang, P.; Li, Y.; Reddy, C.K. Machine Learning for Survival Analysis: A Survey. ACM Comput. Surv. 2019, 51, 1–36. [Google Scholar] [CrossRef]
- Gepp, A.; Kumar, K. Predicting Financial Distress: A Comparison of Survival Analysis and Decision Tree Techniques. Procedia Comput. Sci. 2015, 54, 396–404. [Google Scholar] [CrossRef] [Green Version]
- Bou-Hamad, I.; Larocque, D.; Ben-Ameur, H. Forests with Time-Varying Covariates: Application to Bankruptcy Data; Les Cahiers du GERAD; GERAD: Montreal, QC, Canada, 2009. [Google Scholar]
- Werpachowska, A. Forecasting the Impact of State Pension Reforms in Post-Brexit England and Wales Using Microsimulation and Deep Learning. In Proceedings of the PenCon 2018 Pensions Conference, Lodz, Poland, 19–20 April 2018; pp. 120–132. [Google Scholar]
- Zhu, X. Forecasting Employee Turnover in Large Organizations. Ph.D. Thesis, University of Tennessee, Knoxville, TN, USA, 2016. [Google Scholar]
- Bou-Hamad, I.; Larocque, D.; Ben-Ameur, H. A Review of Survival Trees. Stat. Surv. 2011, 5, 44–71. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wadsworth and Brooks/Cole: Monterey, CA, USA, 1984. [Google Scholar]
- Ishwaran, H.; Kogalur, U.B. RandomForestSRC; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
- Shin, S.; Austin, P.C.; Ross, H.J.; Abdel-Qadir, H.; Freitas, C.; Tomlinson, G.; Chicco, D.; Mahendiran, M.; Lawler, P.R.; Billia, F.; et al. Machine Learning vs. Conventional Statistical Models for Predicting Heart Failure Readmission and Mortality. ESC Heart Fail. 2021, 8, 106–115. [Google Scholar] [CrossRef]
- Datema, F.R.; Moya, A.; Krause, P.; Back, T.; Willmes, L.; Langeveld, T.; Beatenburg de Jong, R.J.; Blom, H.M. Novel Head and Neck Cancer Survival Analysis Appraoch: Random Survival Forests versus Cox Proportional Hazards Regression. Head Neck 2012, 34, 50–58. [Google Scholar] [CrossRef]
- De Vries, B.C.S.; Hegeman, J.H.; Nijmeijer, W.; Geerdink, J.; Seifert, C.; Groothuis-Oudshoorn, C.G.M. Comparing Three Machine Learning Approaches to Design a Risk Assessment Tool for Future Fractures: Predicting a Subsequent Major Osteoporotic Fracture in Fracture Patients with Osteopenia and Osteoporosis. Osteoporos. Int. 2021, 32, 437–449. [Google Scholar] [CrossRef] [PubMed]
- Lowsky, D.J.; Ding, Y.; Lee, D.K.K.; Mcculloch, C.E.; Ross, L.F.; Thistlethwaite, J.R.; Zenios, S.A. A K-Nearest Neighbors Survival Probability Prediction Method. Stat. Med. 2013, 32, 2062–2069. [Google Scholar] [CrossRef]
- Moncada-Torres, A.; van Maaren, M.C.; Hendriks, M.P.; Siesling, S.; Geleijnse, G. Explainable Machine Learning Can Outperform Cox Regression Predictions and Provide Insights in Breast Cancer Survival. Sci. Rep. 2021, 11, 6968. [Google Scholar] [CrossRef]
- Prosperi, M.C.F.; Di Giambenedetto, S.; Fanti, I.; Meini, G.; Bruzzone, B.; Callegaro, A.; Penco, G.; Bagnarelli, P.; Micheli, V.; Paolini, E.; et al. A Prognostic Model for Estimating the Time to Virologic Failure in HIV-1 Infected Patients Undergoing a New Combination Antiretroviral Therapy Regimen. BMC Med. Inform. Decis. Mak. 2011, 11, 40. [Google Scholar] [CrossRef] [PubMed]
- Ptak-Chmielewska, A.; Matuszyk, A. Application of the Random Survival Forests Method in the Bankruptcy Prediction for Small and Medium Enterprises. Argum. Econ. 2020, 44, 127–142. [Google Scholar] [CrossRef]
- Tse, G.; Lee, S.; Zhou, J.; Liu, T.; Wong, I.C.K.; Mak, C.; Mok, N.S.; Jeevaratnam, K.; Zhang, Q.; Cheng, S.H.; et al. Territory-Wide Chinese Cohort of Long QT Syndrome: Random Survival Forest and Cox Analyses. Front. Cardiovasc. Med. 2021, 8, 608592. [Google Scholar] [CrossRef] [PubMed]
- Tsiatis, A.A. Semiparametric Theory and Missing Data; Springer: New York, NY, USA, 2006. [Google Scholar]
- Fu, W.; Simonoff, J.S. Survival Trees for Left-Truncated and Right-Censored Data, with Application to Time-Varying Covariate Data. Biostatistics 2017, 18, 352–369. [Google Scholar] [CrossRef] [PubMed]
- Therneau, T.; Crowson, C.; Atkinson, E. Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model. Available online: https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf (accessed on 1 July 2021).
- Vock, D.M.; Wolfson, J.; Bandyopadhyay, S.; Adomavicius, G.; Johnson, P.E.; Vazquez-Benitez, G.; O’Connor, P.J. Adapting Machine Learning Techniques to Censored Time-to-Event Health Record Data: A General-Purpose Approach Using Inverse Probability of Censoring Weighting. J. Biomed. Inform. 2016, 61, 119–131. [Google Scholar] [CrossRef]
- Klein, J.P.; Moeschberger, M.L. Survival Analysis. Techniques for Censored and Truncated Data; Springer: New York, NY, USA, 2003. [Google Scholar]
- Su, Y.R.; Wang, J.L. Modeling Left-Truncated and Right-Censored Survival Data with Longitudinal Covariates. Ann. Stat. 2012, 40, 1465–1488. [Google Scholar] [CrossRef] [Green Version]
- Moradian, H. Three Essays on Survival Forests. Ph.D. Thesis, HEC Montreal, Montréal, QC, Canada, 2017. [Google Scholar]
- Bou-Hamad, I.; Larocque, D.; Ben-Ameur, H. Discrete-Time Survival Trees and Forests with Time-Varying Covariates. Stat. Model. Int. J. 2011, 11, 429–446. [Google Scholar] [CrossRef] [Green Version]
- Schmid, M.; Küchenhoff, H.; Hoerauf, A.; Tutz, G. A Survival Tree Method for the Analysis of Discrete Event Times in Clinical and Epidemiological Studies. Stat. Med. 2016, 35, 734–751. [Google Scholar] [CrossRef]
- Steyerberg, E.W. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating; Springer Nature: Cham, Switzerland, 2019; ISBN 9783030163983. [Google Scholar]
- Dessai, S.; Patil, V. Testing and Interpreting Assumptions of COX Regression Analysis. Cancer Res. Stat. Treat. 2019, 2, 108–111. [Google Scholar]
- Therneau, T.; Lumley, T.; Atkinson, E.; Crowson, C. Package “Survival”; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
- Witten, I.H.; Frank, E. Data Mining. Practical Machine Learning Tools and Techniques, 2nd ed.; Morgan Kaufmann: San Francisco, CA, USA, 2005; ISBN 0120884070. [Google Scholar]
- Kamarudin, A.N.; Cox, T.; Kolamunnage-Dona, R. Time-Dependent ROC Curve Analysis in Medical Research: Current Methods and Applications. BMC Med. Res. Methodol. 2017, 17, 53. [Google Scholar] [CrossRef] [Green Version]
- Brier, G.W. Verification of Forecasts Expressed in Terms of Probability. Mon. Weather Rev. 1950, 78, 2–4. [Google Scholar] [CrossRef]
- Nagelkerke, N.J.D. A Note on a General Definition of the Coefficient of Determination. Biometrika 1991, 78, 691–692. [Google Scholar] [CrossRef]
- Pencina, M.J.; Agostino, R.B.D. Overall C as a Measure of Discrimination in Survival Analysis: Model Specific Population Value and Confidence Interval Estimation. Stat. Med. 2004, 23, 2109–2123. [Google Scholar] [CrossRef]
- Blanche, P.; Kattan, M.W.; Gerds, T.A. The C-Index Is Not Proper for the Evaluation of t -Year Predicted Risks. Biostatistics 2019, 20, 347–357. [Google Scholar] [CrossRef] [PubMed]
- Kvamme, H.; Borgan, Ø. The Brier Score under Administrative Censoring: Problems and Solutions. arXiv 2019, arXiv:1912.085819. [Google Scholar]
- Gerds, T.A.; Schumacher, M. Consistent Estimation of the Expected Brier Score in General Survival Models with Right-Censored Event Times. Biom. J. 2006, 48, 1029–1040. [Google Scholar] [CrossRef]
- Gerber, G.; Le Faou, Y.; Lopez, O.; Trupin, M. The Impact of Churn on Client Value in Health Insurance, Evaluation Using a Random Forest Under Various Censoring Mechanisms. J. Am. Stat. Assoc. 2020, 1–12. [Google Scholar] [CrossRef]
- Morris, T.P.; White, I.R.; Crowther, M.J. Using Simulation Studies to Evaluate Statistical Methods. Stat. Med. 2019, 38, 2074–2102. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Reeuwijk, K.G.; van Klaveren, D.; van Rijn, R.M.; Burdorf, A.; Robroek, S.J.W. The Influence of Poor Health on Competing Exit Routes from Paid Employment among Older Workers in 11 European Countries. Scand. J. Work. Environ. Health 2017, 43, 24–33. [Google Scholar] [CrossRef]
Biblioshiny Most Frequent Words Search Field | |||
---|---|---|---|
Search Term | Author’s Keywords | Keywords Plus | Abstract |
Neural networks | 8 | 19 | 27 |
Forest | 14 | 7 | 61 |
SVM | 7 | 3 | 28 |
Bayesian | 8 | 2 | 23 |
Boosting | 3 | 0 | 15 |
Bagging | 0 | 0 | 0 |
Source | Topic | Comparison Criterion for Model Performance | Simulations | Best Performance |
---|---|---|---|---|
[106] | Cancer survival | Harrell’s concordance index (C-index) | No | Cox |
[107] | Osteoporotic fractures | C-index | No | Cox |
[108] | Kidney transplants | Integrated prediction error curve (IPEC) | Yes | Random survival forests |
[109] | Cancer survival | C-index | No | Machine-learning-based models |
[110] | HIV treatment | C-index | No | Random survival forests |
[111] | Bankruptcy | C-index | No | Random survival forests |
[112] | Congenital heart channelopathy | Brier score | No | Random survival forests |
Retirement Status | |||
---|---|---|---|
Covariate | Censored | Retired | Total |
(N = 884) | (N = 1998) | (N = 2882) | |
Gender | |||
Male | 437 (49.4%) | 1111 (55.6%) | 1548 (53.7%) |
Female | 447 (50.6%) | 887 (44.4%) | 1334 (46.3%) |
Country | |||
Austria | 12 (1.4%) | 101 (5.1%) | 113 (3.9%) |
Germany | 92 (10.4%) | 196 (9.8%) | 288 (10.0%) |
Sweden | 145 (16.4%) | 358 (17.9%) | 503 (17.5%) |
Netherlands | 165 (18.7%) | 156 (7.8%) | 321 (11.1%) |
Spain | 33 (3.7%) | 109 (5.5%) | 142 (4.9%) |
Italy | 39 (4.4%) | 166 (8.3%) | 205 (7.1%) |
France | 95 (10.7%) | 252 (12.6%) | 347 (12.0%) |
Denmark | 110 (12.4%) | 214 (10.7%) | 324 (11.2%) |
Switzerland | 69 (7.8%) | 124 (6.2%) | 193 (6.7%) |
Belgium | 124 (14.0%) | 322 (16.1%) | 446 (15.5%) |
Self-perceived health status in 2004 | |||
Poor | 9 (1.0%) | 29 (1.5%) | 38 (1.3%) |
Fair | 72 (8.1%) | 174 (8.7%) | 246 (8.5%) |
Good | 310 (35.1%) | 800 (40.0%) | 1110 (38.5%) |
Very good | 278 (31.4%) | 587 (29.4%) | 865 (30.0%) |
Excellent | 215 (24.3%) | 408 (20.4%) | 623 (21.6%) |
Exit wave | |||
Mean (SD) | 3.85 (1.63) | 3.35 (1.23) | 3.50 (1.38) |
Median (Min, Max) | 4.00 (2.00, 6.00) | 3.00 (2.00, 6.00) | 3.00 (2.00, 6.00) |
Age in 2004 | |||
Mean (SD) | 52.3 (4.01) | 56.9 (4.22) | 55.5 (4.67) |
Median (Min, Max) | 52.0 (38.0, 76.0) | 57.0 (47.0, 82.0) | 55.0 (38.0, 82.0) |
Grandchildren in 2004 | |||
Yes | 669 (75.7%) | 1084 (54.3%) | 1753 (60.8%) |
No | 215 (24.3%) | 914 (45.7%) | 1129 (39.2%) |
Paid rent or mortgage in 2004 | |||
No | 231 (26.1%) | 828 (41.4%) | 1059 (36.7%) |
Yes | 653 (73.9%) | 1170 (58.6%) | 1823 (63.3%) |
Years in education in 2004 | |||
Mean (SD) | 12.6 (3.71) | 12.0 (3.81) | 12.2 (3.79) |
Median (Min, Max) | 13.0 (0, 21.0) | 12.0 (0, 21.0) | 12.0 (0, 21.0) |
Before | After | After | ||||
---|---|---|---|---|---|---|
Stratification | Stratification 1 | Stratification 2 | ||||
Chi-Squared | p | Chi-Squared | p | Chi-Squared | p | |
country | 12.8925 | 0.1675 | 18.1786 | 0.033 | ||
gender | 8.7456 | 0.0031 | ||||
age | 42.5487 | <0.00001 | ||||
sphus | 0.0466 | 0.8297 | 0.4119 | 0.521 | 0.38866 | 0.53 |
grchild_bin | 0.4912 | 0.4834 | 0.0931 | 0.760 | 0.02468 | 0.88 |
yedu | 0.6002 | 0.4385 | 0.0573 | 0.811 | 0.03024 | 0.86 |
rent_or_mortgage | 0.0159 | 0.8996 | 0.4463 | 0.504 | 0.00726 | 0.93 |
Global | 54.9196 | <0.00001 | 18.8434 | 0.128 | 0.52460 | 0.97 |
t | u | ||||
---|---|---|---|---|---|
Value | 2 | 3 | 4 | 5 | 6 |
1 | o | o | o | o | o |
2 | o | o | o | o | |
3 | o | o | o | ||
4 | o | o | |||
5 | o |
Hazard (95%CI) | p | |
---|---|---|
gender | 1.00 (0.92–1.1) | 0.8971 |
sphus | 0.90 (0.86–0.94) | <0.0001 |
grchild_bin | 1.28 (1.16–1.41) | 0.000000 |
yedu | 0.98 (0.97–0.99) | <0.01 |
rent_or_mortgage | 0.82 (0.74–0.91) | <0.001 |
Time | Cox—TDCs | Cox—No TDCs | RSF—No TDCs |
---|---|---|---|
T = 2 | 0.5351 | 0.4985 | 0.5211 |
T = 3 | 0.6131 | 0.5556 | 0.5706 |
T = 4 | 0.6842 | 0.5463 | 0.5370 |
T = 5 | 0.5900 | 0.4390 | 0.4878 |
T = 6 | 0.5333 | 0.5030 | 0.5162 |
Average | 0.5909 | 0.5084 | 0.5265 |
Average Coefficient | Average p-Value | Median p-Value | |
---|---|---|---|
gender | 1.0009 | 0.6066 | 0.6131 |
sphus | −0.09056 | 0.0270 | 0.0053 |
grchild_bin | 1.2829 | 0.0034 | 0.0004 |
yedu | −0.9822 | 0.1119 | 0.4912 |
rent_or_mortgage | −0.8223 | 0.0349 | 0.0093 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Garibay, M.G.; Srakar, A.; Bartolj, T.; Sambt, J. Does Machine Learning Offer Added Value Vis-à-Vis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE). Mathematics 2022, 10, 152. https://doi.org/10.3390/math10010152
Garibay MG, Srakar A, Bartolj T, Sambt J. Does Machine Learning Offer Added Value Vis-à-Vis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE). Mathematics. 2022; 10(1):152. https://doi.org/10.3390/math10010152
Chicago/Turabian StyleGaribay, Montserrat González, Andrej Srakar, Tjaša Bartolj, and Jože Sambt. 2022. "Does Machine Learning Offer Added Value Vis-à-Vis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE)" Mathematics 10, no. 1: 152. https://doi.org/10.3390/math10010152
APA StyleGaribay, M. G., Srakar, A., Bartolj, T., & Sambt, J. (2022). Does Machine Learning Offer Added Value Vis-à-Vis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE). Mathematics, 10(1), 152. https://doi.org/10.3390/math10010152