[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3578245.3584935acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article
Open access

Heterogeneous Datasets for Federated Survival Analysis Simulation

Published: 15 April 2023 Publication History

Abstract

Survival analysis studies time-modeling techniques for an event of interest occurring for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, the data needed to train survival models are often distributed, incomplete, censored, and confidential. In this context, federated learning can be exploited to tremendously improve the quality of the models trained on distributed data while preserving user privacy. However, federated survival analysis is still in its early development, and there is no common benchmarking dataset to test federated survival models. This work provides a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way. Specifically, we propose two dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client: quantity-skewed splitting and label-skewed splitting. Furthermore, these algorithms allow for obtaining different levels of heterogeneity by changing a single hyperparameter. Finally, numerical experiments provide a quantitative evaluation of the heterogeneity level using log-rank tests and a qualitative analysis of the generated splits. The implementation of the proposed methods is publicly available in favor of reproducibility and to encourage common practices to simulate federated environments for survival analysis.

References

[1]
Odd Aalen. 1978. Nonparametric inference for a family of counting processes. The Annals of Statistics (1978), 701--726.
[2]
Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro, Matthew Mattina, Paul N Whatmough, and Venkatesh Saligrama. 2021. Federated learning based on dynamic regularization. arXiv preprint arXiv:2111.04263 (2021).
[3]
Mathieu Andreux, Andre Manoel, Romuald Menuet, Charlie Saillard, and Chloé Simpson. 2020. Federated survival analysis with discrete-time Cox models. arXiv preprint arXiv:2006.08997 (2020).
[4]
Alberto Archetti, Eugenio Lomurno, Lattari Francesco, André Martin, and Matteo Matteucci. 2023. Heterogeneous Datasets for Federated Survival Analysis Simulation. https://doi.org/10.5281/zenodo.7661027
[5]
The TensorFlow Federated Authors. 2018. TensorFlow Federated. https://github.com/tensorflow/federated
[6]
Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Titouan Parcollet, Pedro PB de Gusm ao, and Nicholas D Lane. 2020. Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390 (2020).
[7]
J Martin Bland and Douglas G Altman. 2004. The logrank test. Bmj, Vol. 328, 7447 (2004), 1073.
[8]
Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konevc nỳ, H Brendan McMahan, Virginia Smith, and Ameet Talwalkar. 2018. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097 (2018).
[9]
D. R. Cox. 1972. Regression Models and Life-Tables. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 34, 2 (1972), 187--220. http://www.jstor.org/stable/2985181
[10]
Sidney J Cutler and Fred Ederer. 1958. Maximum utilization of the life table method in analyzing survival. Journal of chronic diseases, Vol. 8, 6 (1958), 699--712.
[11]
Cameron Davidson-Pilon. 2019. lifelines: survival analysis in Python. Journal of Open Source Software, Vol. 4, 40 (2019), 1317.
[12]
Angela Dispenzieri, Jerry A Katzmann, Robert A Kyle, Dirk R Larson, Terry M Therneau, Colin L Colby, Raynell J Clark, Graham P Mead, Shaji Kumar, L Joseph Melton III, et al. 2012. Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population. In Mayo Clinic Proceedings, Vol. 87. Elsevier, 517--523.
[13]
Erik Drysdale. 2022. SurvSet: An open-source time-to-event dataset repository. arXiv preprint arXiv:2203.03094 (2022).
[14]
Frank Emmert-Streib and Matthias Dehmer. 2019. Introduction to survival analysis in practice. Machine Learning and Knowledge Extraction, Vol. 1, 3 (2019), 1013--1038.
[15]
Stephane Fotso et al. 2019--. PySurvival: Open source package for Survival Analysis modeling. https://www.pysurvival.io/
[16]
Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. 2019. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335 (2019).
[17]
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, Vol. 14, 1--2 (2021), 1--210.
[18]
Edward L Kaplan and Paul Meier. 1958. Nonparametric estimation from incomplete observations. Journal of the American statistical association, Vol. 53, 282 (1958), 457--481.
[19]
Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. 2020. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning. PMLR, 5132--5143.
[20]
Jared L Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. 2018. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology, Vol. 18, 1 (2018), 1--12.
[21]
John P Klein and Melvin L Moeschberger. 2003. Survival analysis: techniques for censored and truncated data. Vol. 1230. Springer.
[22]
Håvard Kvamme and Ørnulf Borgan. 2021. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Analysis, Vol. 27, 4 (2021), 710--736.
[23]
Håvard Kvamme, Ørnulf Borgan, and Ida Scheel. 2019. Time-to-event prediction with neural networks and Cox regression. arXiv preprint arXiv:1907.00825 (2019).
[24]
Elisa T Lee and John Wang. 2003. Statistical methods for survival data analysis. Vol. 476. John Wiley & Sons.
[25]
Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. 2022. Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 965--978.
[26]
Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020a. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, Vol. 37, 3 (2020), 50--60.
[27]
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020b. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, Vol. 2 (2020), 429--450.
[28]
Eugenio Lomurno, Alberto Archetti, Lorenzo Cazzella, Stefano Samele, Leonardo Di Perna, and Matteo Matteucci. 2022. SGDE: Secure Generative Data Exchange for Cross-Silo Federated Learning. In AIPR 2022, International Conference on Artificial Intelligence and Pattern Recognition.
[29]
Chia-Lun Lu, Shuang Wang, Zhanglong Ji, Yuan Wu, Li Xiong, Xiaoqian Jiang, and Lucila Ohno-Machado. 2015. WebDISCO: a web service for distributed cox model learning without patient-level data sharing. Journal of the American Medical Informatics Association, Vol. 22, 6 (2015), 1212--1219.
[30]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273--1282.
[31]
Viraaji Mothukuri, Reza M Parizi, Seyedamin Pouriyeh, Yan Huang, Ali Dehghantanha, and Gautam Srivastava. 2021. A survey on security and privacy of federated learning. Future Generation Computer Systems, Vol. 115 (2021), 619--640.
[32]
Wayne Nelson. 1972. Theory and applications of hazard plotting for censored failure data. Technometrics, Vol. 14, 4 (1972), 945--966.
[33]
Sebastian Pölsterl. 2020. scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn. Journal of Machine Learning Research, Vol. 21, 212 (2020), 1--6. http://jmlr.org/papers/v21/20--729.html
[34]
Shadi Rahimian, Raouf Kerkouche, Ina Kurth, and Mario Fritz. 2022. Practical Challenges in Differentially-Private Federated Survival Analysis of Medical Data. In Conference on Health, Inference, and Learning. PMLR, 411--425.
[35]
Md Mahmudur Rahman and Sanjay Purushotham. 2022. FedPseudo: Pseudo value-based Deep Learning Models for Federated Survival Analysis. arXiv preprint arXiv:2207.05247 (2022).
[36]
Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletari, Holger R Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N Galtier, Bennett A Landman, Klaus Maier-Hein, et al. 2020. The future of digital health with federated learning. NPJ digital medicine, Vol. 3, 1 (2020), 1--7.
[37]
Brian Ripley, Bill Venables, Douglas M. Bates, Kurt Hornik, Albrecht Gebhardt, and David Firth. 2022. R package: MASS. https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/00Index.html
[38]
M Schumacher, G Bastert, H Bojar, K Hübner, M Olschewski, W Sauerbrei, C Schmoor, C Beyerle, RL Neumann, and HF Rauschecker. 1994. Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. Journal of Clinical Oncology, Vol. 12, 10 (1994), 2086--2093.
[39]
Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, et al. 2022. FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings. arXiv preprint arXiv:2210.04620 (2022).
[40]
Terry Therneau, Thomas Lumley, Elizabeth Atkinson, and Cynthia Crowson. 2023. R package: survival. https://stat.ethz.ch/R-manual/R-devel/library/survival/html/00Index.html
[41]
Vanderbilt University Department of Biostatistics. 2022. Vanderbilt Biostatistics Datasets. http://hbiostat.org/data
[42]
Ping Wang, Yan Li, and Chandan K Reddy. 2019. Machine learning for survival analysis: A survey. ACM Computing Surveys (CSUR), Vol. 51, 6 (2019), 1--36.
[43]
Xuan Wang, Harrison G Zhang, Xin Xiong, Chuan Hong, Griffin M Weber, Gabriel A Brat, Clara-Lea Bonzel, Yuan Luo, Rui Duan, Nathan P Palmer, et al. 2022. SurvMaximin: robust federated approach to transporting survival risk prediction models. Journal of biomedical informatics, Vol. 134 (2022), 104176.

Cited By

View all
  • (2024)Methodology of solving the feature selection problem for the Cox regression modelVestnik of Astrakhan State Technical University. Series: Management, computer science and informatics10.24143/2072-9502-2024-3-85-942024:3(85-94)Online publication date: 29-Jul-2024
  • (2024)Algorithm for Constructing the Hazard Function of the Extended Cox Model and its Application to the Prostate Cancer Patient DatabaseAdvanced Engineering Research (Rostov-on-Don)10.23947/2687-1653-2024-24-4-413-42324:4(413-423)Online publication date: 25-Dec-2024
  • (2024)FlocOff: Data Heterogeneity Resilient Federated Learning With Communication-Efficient Edge OffloadingIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.343152642:11(3262-3277)Online publication date: Nov-2024
  • Show More Cited By

Index Terms

  1. Heterogeneous Datasets for Federated Survival Analysis Simulation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering
      April 2023
      421 pages
      ISBN:9798400700729
      DOI:10.1145/3578245
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 April 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. datasets
      2. federated learning
      3. survival analysis

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ICPE '23

      Acceptance Rates

      Overall Acceptance Rate 252 of 851 submissions, 30%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)184
      • Downloads (Last 6 weeks)20
      Reflects downloads up to 27 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Methodology of solving the feature selection problem for the Cox regression modelVestnik of Astrakhan State Technical University. Series: Management, computer science and informatics10.24143/2072-9502-2024-3-85-942024:3(85-94)Online publication date: 29-Jul-2024
      • (2024)Algorithm for Constructing the Hazard Function of the Extended Cox Model and its Application to the Prostate Cancer Patient DatabaseAdvanced Engineering Research (Rostov-on-Don)10.23947/2687-1653-2024-24-4-413-42324:4(413-423)Online publication date: 25-Dec-2024
      • (2024)FlocOff: Data Heterogeneity Resilient Federated Learning With Communication-Efficient Edge OffloadingIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.343152642:11(3262-3277)Online publication date: Nov-2024
      • (2023)Federated Survival Forests2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10190999(1-9)Online publication date: 18-Jun-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media