Abstract
As machine learning (ML) systems are becoming pervasive in high-stakes applications, the issue of ML fairness is receiving increasing attention. A large variety of fair ML solutions have been developed to ensure that bias and inaccuracies in the data and model do not lead to decisions that treat individuals unfavorably on the basis of sensitive characteristics. While most of the fair ML literature focus on classification and regression setting, fairness of survival analysis for time-to-event outcomes are under-explored. In contrast to existing fair survival analysis solutions which typically incorporate fairness constraints in the learning mechanisms, we propose several pre-processing and post-processing approaches. Due to the model-agnostic nature of pre-processing and post-processing methods, they may offer more flexible fairness intervention. Additionally, pre-processing and post-processing methods tend to be more intuitive and explainable compared to in-processing methods. We carry out experimental studies with medical and non-medical data sets to evaluate the proposed fairness methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allen, A.M., Therneau, T.M., Larson, J.J., Coward, A., Somers, V.K., Kamath, P.S.: Nonalcoholic fatty liver disease incidence and impact on metabolic burden and death: a 20 year-community study. Hepatology 67(5), 1726–1736 (2018)
Antolini, L., Boracchi, P., Biganzoli, E.: A time-dependent discrimination index for survival data. Stat. Med. 24(24), 3927–3944 (2005)
Bender, A., Scheipl, F.: Pammtools: piece-wise exponential additive mixed modeling tools. arXiv preprint arXiv:1806.01042 (2018)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Caton, S., Haas, C.: Fairness in machine learning: a survey. arXiv preprint arXiv:2010.04053 (2020)
Cox, D.R.: Regression models and life-tables. J. Roy. Stat. Soc.: Ser. B (Methodol.) 34(2), 187–202 (1972)
Dispenzieri, A., et al.: Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population. In: Mayo Clinic Proceedings, vol. 87, no. 6, pp. 517–523. Elsevier (2012)
Eleuteri, A., Tagliaferri, R., Milano, L., De Placido, S., De Laurentiis, M.: A novel neural network-based survival analysis model. Neural Netw. 16(5–6), 855–864 (2003)
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268 (2015)
Ganapaneni, S.: DSPP1 [data set]. Kaggle (2022). https://www.kaggle.com/datasets/gsagar12/dspp1
Graf, E., Schmoor, C., Sauerbrei, W., Schumacher, M.: Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18(17–18), 2529–2545 (1999)
Harrison, T., Ansell, J.: Customer retention in the insurance industry: using survival analysis to predict cross-selling opportunities. J. Financ. Serv. Mark. 6, 229–239 (2002)
Hort, M., Chen, Z., Zhang, J.M., Sarro, F., Harman, M.: Bia mitigation for machine learning classifiers: a comprehensive survey. arXiv preprint arXiv:2207.07068 (2022)
Hosmer, D.W., Lemeshow, S., May, S.: Regression modeling of time to event data. New York (1999)
Hougaard, P.: Fundamentals of survival data. Biometrics 55(1), 13–22 (1999)
Hu, S., Chen, G.H.: Distributionally robust survival analysis: a novel fairness loss without demographics. In: Machine Learning for Health, pp. 62–87. PMLR (2022)
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)
Jin, Z., Shang, J., Zhu, Q., Ling, C., Xie, W., Qiang, B.: RFRSF: employee turnover prediction based on random forests and survival analysis. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2020. LNCS, vol. 12343, pp. 503–515. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62008-0_35
Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33(1), 1–33 (2012)
Kamishima, T., Akaho, S., Asoh, H., Sakuma, J.: Fairness-aware classifier with prejudice remover regularizer. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 35–50. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_3
Katzman, J.L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., Kluger, Y.: Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC Med. Res. Methodol. 18(1), 1–12 (2018)
Keya, K.N., Islam, R., Pan, S., Stockwell, I., Foulds, J.: Equitable allocation of healthcare resources with fair survival models. In: Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), pp. 190–198. SIAM (2021)
Lee, C., Zame, W., Yoon, J., Van Der Schaar, M.: Deephit: a deep learning approach to survival analysis with competing risks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 54(6), 1–35 (2021)
Pagano, T.P., et al.: Bias and unfairness in machine learning models: a systematic literature review. arXiv preprint arXiv:2202.08176 (2022)
Pessach, D., Shmueli, E.: Algorithmic fairness. arXiv preprint arXiv:2001.09784 (2020)
Pope, D.G., Sydnor, J.R.: Implementing anti-discrimination policies in statistical profiling models. Am. Econ. J. Econ. Pol. 3(3), 206–231 (2011)
Rahman, M.M., Purushotham, S.: Fair and interpretable models for survival analysis. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1452–1462 (2022)
Ripley, R.M., Harris, A.L., Tarassenko, L.: Non-linear survival analysis using neural networks. Stat. Med. 23(5), 825–842 (2004)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39(5), 1 (2011)
Sonabend, R., Pfisterer, F., Mishler, A., Schauer, M., Burk, L., Vollmer, S.: Flexible group fairness metrics for survival analysis. arXiv preprint arXiv:2206.03256 (2022)
Türk, U., Sap, S.: The effect of the Covid-19 on sharing economy: survival analysis of Airbnb listings. Bus. Manag. Stud. Int. J. 9(1), 215–226 (2021)
Uno, H., Cai, T., Tian, L., Wei, L.J.: Evaluating prediction rules for T-year survivors with censored regression models. J. Am. Stat. Assoc. 102(478), 527–537 (2007)
Verma, S.: Weapons of math destruction: how big data increases inequality and threatens democracy. Vikalpa 44(2), 97–98 (2019)
Wang, P., Li, Y., Reddy, C.K.: Machine learning for survival analysis: a survey. ACM Comput. Surv. (CSUR) 51(6), 1–36 (2019)
Wei, L.J.: The accelerated failure time model: a useful alternative to the cox regression model in survival analysis. Stat. Med. 11(14–15), 1871–1879 (1992)
Wijaya, D.: Employee turnover dataset [data set]. Kaggle (2020). https://www.kaggle.com/datasets/davinwijaya/employee-turnover. Original data from: Babushkin, Edward. (2017). Employee Turnover: How to Predict Individual Risks of Quitting [Blog post]. https://edwvb.blogspot.com/2017/10/employee-turnover-how-to-predict-individual-risks-of-quitting.html
Xin, X., Huang, F.: Anti-discrimination insurance pricing: regulations, fairness criteria, and models. Fairness Criteria, and Models (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Statement
This research aims to explore the discrimination mitigation processing techniques (both pre-processing and post-processing) in time-to-event analysis. The following ethical considerations have been addressed in the research design and procedures:
1. Voluntary participation: This study does not involve primary data collection.
2. Confidentiality: All data used in this study are publicly available and are properly cited.
3. Potential for harm: There is minimal risk or potential harm associated with this study.
4. Gender consideration: This study investigates potential biases and unfairness associated with using gender as a feature in survival analysis and proposes several approaches to address the unfairness.
5. Results communication: The study’s results will serve academic purposes exclusively and will be reported in academic journals or conferences. The results will be presented accurately and without bias, acknowledging any limitations or ethical dilemmas encountered throughout the study.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, Z., Ng, T.L.J. (2023). Fairness-Aware Processing Techniques in Survival Analysis: Promoting Equitable Predictions. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-43427-3_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)